From owner-xfs@oss.sgi.com Mon Jan 1 23:36:54 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 01 Jan 2007 23:36:59 -0800 (PST) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l027aqqw013711 for ; Mon, 1 Jan 2007 23:36:53 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id SAA06103; Tue, 2 Jan 2007 18:35:58 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l027Zw7Y80360519; Tue, 2 Jan 2007 18:35:58 +1100 (AEDT) Received: (from allanr@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id l027Zwpt80352840; Tue, 2 Jan 2007 18:35:58 +1100 (AEDT) Date: Tue, 2 Jan 2007 18:35:58 +1100 (AEDT) From: Allan Randall Message-Id: <200701020735.l027Zwpt80352840@snort.melbourne.sgi.com> To: linux-xfs@oss.sgi.com, asg-qa@melbourne.sgi.com Subject: TAKE - Dmapi QA build X-archive-position: 10156 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: allanr@snort.melbourne.sgi.com Precedence: bulk X-list: xfs Dmapi QA build fix Date: Tue Jan 2 18:34:57 AEDT 2007 Workarea: snort.melbourne.sgi.com:/home/allanr/isms/xfs-cmds-2 Inspected by: ddiss The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/xfs-cmds/master-melb Modid: master-melb:xfs-cmds:27826a xfstests/dmapi/Makefile.in - 1.9 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/xfstests/dmapi/Makefile.in.diff?r1=text&tr1=1.9&r2=text&tr2=1.8&f=h - added default make option xfstests/include/buildmacros - 1.8 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/xfstests/include/buildmacros.diff?r1=text&tr1=1.8&r2=text&tr2=1.7&f=h - removed special case for dmapi dir From owner-xfs@oss.sgi.com Tue Jan 2 03:36:05 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 02 Jan 2007 03:36:10 -0800 (PST) Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l02Ba3qw010191 for ; Tue, 2 Jan 2007 03:36:04 -0800 Received: from hch by pentafluge.infradead.org with local (Exim 4.63 #1 (Red Hat Linux)) id 1H1heM-00061E-4u; Tue, 02 Jan 2007 11:17:46 +0000 Date: Tue, 2 Jan 2007 11:17:46 +0000 From: "'Christoph Hellwig'" To: "Chen, Kenneth W" Cc: "'Christoph Hellwig'" , "'Andrew Morton'" , Dmitriy Monakhov , Dmitriy Monakhov , linux-kernel@vger.kernel.org, Linux Memory Management , devel@openvz.org, xfs@oss.sgi.com Subject: Re: [PATCH] incorrect error handling inside generic_file_direct_write Message-ID: <20070102111746.GA22657@infradead.org> Mail-Followup-To: 'Christoph Hellwig' , "Chen, Kenneth W" , 'Andrew Morton' , Dmitriy Monakhov , Dmitriy Monakhov , linux-kernel@vger.kernel.org, Linux Memory Management , devel@openvz.org, xfs@oss.sgi.com References: <20061215104341.GA20089@infradead.org> <000101c7207a$48c138f0$ff0da8c0@amr.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <000101c7207a$48c138f0$ff0da8c0@amr.corp.intel.com> User-Agent: Mutt/1.4.2.2i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-archive-position: 10158 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs Content-Length: 1377 Lines: 33 On Fri, Dec 15, 2006 at 10:53:18AM -0800, Chen, Kenneth W wrote: > Christoph Hellwig wrote on Friday, December 15, 2006 2:44 AM > > So we're doing the sync_page_range once in __generic_file_aio_write > > with i_mutex held. > > > > > > > mutex_lock(&inode->i_mutex); > > > - ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs, > > > - &iocb->ki_pos); > > > + ret = __generic_file_aio_write(iocb, iov, nr_segs, pos); > > > mutex_unlock(&inode->i_mutex); > > > > > > if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) { > > > > And then another time after it's unlocked, this seems wrong. > > > I didn't invent that mess though. > > I should've ask the question first: in 2.6.20-rc1, generic_file_aio_write > will call sync_page_range twice, once from __generic_file_aio_write_nolock > and once within the function itself. Is it redundant? Can we delete the > one in the top level function? Like the following? Really? I'm looking at -rc3 now as -rc1 is rather old and it's definitly not the case there. I also can't remember ever doing this - when I started the generic read/write path untangling I had exactly the same situation that's now in -rc3: - generic_file_aio_write_nolock calls sync_page_range_nolock - generic_file_aio_write calls sync_page_range - __generic_file_aio_write_nolock doesn't call any sync_page_range variant From owner-xfs@oss.sgi.com Fri Jan 5 14:33:01 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 05 Jan 2007 14:33:07 -0800 (PST) Received: from service.eng.exegy.net (68-191-203-42.static.stls.mo.charter.com [68.191.203.42]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l05MWxqw011921 for ; Fri, 5 Jan 2007 14:33:00 -0800 Received: from HANAFORD.eng.exegy.net (hanaford.eng.exegy.net [10.19.1.4]) by service.eng.exegy.net (8.13.1/8.13.1) with ESMTP id l05MJnIS010019 for ; Fri, 5 Jan 2007 16:19:49 -0600 X-Ninja-PIM: Scanned by Ninja X-Ninja-AttachmentFiltering: (no action) Received: from [10.19.4.98] ([10.19.4.98]) by HANAFORD.eng.exegy.net with Microsoft SMTPSVC(6.0.3790.1830); Fri, 5 Jan 2007 16:19:48 -0600 Message-ID: <459ECF04.4090803@exegy.com> Date: Fri, 05 Jan 2007 16:19:48 -0600 From: Dave Lloyd User-Agent: Thunderbird 1.5.0.5 (X11/20060815) MIME-Version: 1.0 To: linux-xfs@oss.sgi.com Subject: [Fwd: [Fwd: xfs write speed regression 2.6.18.1 to 2.6.19.1]] Content-Type: multipart/mixed; boundary="------------060803090605060507010505" X-OriginalArrivalTime: 05 Jan 2007 22:19:48.0567 (UTC) FILETIME=[9CA8DE70:01C73117] X-archive-position: 10174 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dlloyd@exegy.com Precedence: bulk X-list: xfs Content-Length: 24707 Lines: 478 This is a multi-part message in MIME format. --------------060803090605060507010505 Content-Type: text/plain; charset="ISO-8859-1"; format="flowed" Content-Transfer-Encoding: 7bit From a co-worker. Anyone know what might have changed this between 2.6.18 and 2.6.19 when the issue first appeared? -- Dave Lloyd Product Support Engineer, Exegy, Inc. +1.314.450.5342 dlloyd@exegy.com -------- Original Message -------- Subject: [Fwd: xfs write speed regression 2.6.18.1 to 2.6.19.1] Date: Fri, 05 Jan 2007 16:16:11 -0600 From: Mr. Berkley Shands To: Dave Lloyd The short summary is under 2.6.18.* xfs is able to maintain a write rate of > 900MB/Sec for the first TB of data. Peak is ~256MB/Sec per raid X 4 raids. Under 2.6.19.1 this rate drops to ~220MB/Sec. and the allocations are no longer smooth starting at the outside edges. under 2.6.20-rc3 the speeds have gone back up some, but they are 10% slower than 2.6.18. and the allocations, as shown by the sequential writes (attached) are random. If I went all the way out to the inside tracks, you would be at about 490MB/Sec. Something changed. 2.6.19 was unstable, with XFS panics on a regular basis. 2.6.19.1 has not had an error yet.. (knock head on wall repeatedly). berkley -- //E. F. Berkley Shands, MSc// **Exegy Inc.** 3668 S. Geyer Road, Suite 300 St. Louis, MO 63127 Direct: (314) 450-5348 Cell: (314) 303-2546 Office: (314) 450-5353 Fax: (314) 450-5354 This e-mail and any documents accompanying it may contain legally privileged and/or confidential information belonging to Exegy, Inc. Such information may be protected from disclosure by law. The information is intended for use by only the addressee. If you are not the intended recipient, you are hereby notified that any disclosure or use of the information is strictly prohibited. If you have received this e-mail in error, please immediately contact the sender by e-mail or phone regarding instructions for return or destruction and do not use or disclose the content to others. --------------060803090605060507010505 Content-Type: text/plain; name="2.6.20-rc3run.txt" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="2.6.20-rc3run.txt" Data: Writing, 8192 MB, Buffer: 128 KB, Time: 58322 MS, Rate: 140.462, to /s2/GigaData.0 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 58321 MS, Rate: 140.464, to /s0/GigaData.0 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 58325 MS, Rate: 140.454, to /s1/GigaData.0 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 58356 MS, Rate: 140.380, to /s3/GigaData.0 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 43031 MS, Rate: 190.374, to /s2/GigaData.1 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 43043 MS, Rate: 190.321, to /s3/GigaData.1 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 44439 MS, Rate: 184.343, to /s1/GigaData.1 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 44439 MS, Rate: 184.343, to /s0/GigaData.1 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 38326 MS, Rate: 213.745, to /s0/GigaData.2 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 47993 MS, Rate: 170.692, to /s3/GigaData.2 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 47999 MS, Rate: 170.670, to /s1/GigaData.2 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 47998 MS, Rate: 170.674, to /s2/GigaData.2 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 46339 MS, Rate: 176.784, to /s1/GigaData.3 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 46337 MS, Rate: 176.792, to /s2/GigaData.3 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 46333 MS, Rate: 176.807, to /s3/GigaData.3 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 46369 MS, Rate: 176.670, to /s0/GigaData.3 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 44952 MS, Rate: 182.239, to /s1/GigaData.4 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 44952 MS, Rate: 182.239, to /s0/GigaData.4 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 44952 MS, Rate: 182.239, to /s3/GigaData.4 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 44951 MS, Rate: 182.243, to /s2/GigaData.4 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 41365 MS, Rate: 198.042, to /s2/GigaData.5 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 41357 MS, Rate: 198.080, to /s3/GigaData.5 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 41365 MS, Rate: 198.042, to /s0/GigaData.5 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 41365 MS, Rate: 198.042, to /s1/GigaData.5 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 38159 MS, Rate: 214.681, to /s1/GigaData.6 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 38168 MS, Rate: 214.630, to /s0/GigaData.6 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 39051 MS, Rate: 209.777, to /s3/GigaData.6 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 39055 MS, Rate: 209.755, to /s2/GigaData.6 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 37189 MS, Rate: 220.280, to /s2/GigaData.7 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 37189 MS, Rate: 220.280, to /s0/GigaData.7 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 37190 MS, Rate: 220.274, to /s1/GigaData.7 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 37189 MS, Rate: 220.280, to /s3/GigaData.7 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 34397 MS, Rate: 238.160, to /s3/GigaData.8 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 34876 MS, Rate: 234.889, to /s0/GigaData.8 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 34870 MS, Rate: 234.930, to /s1/GigaData.8 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 34880 MS, Rate: 234.862, to /s2/GigaData.8 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 36910 MS, Rate: 221.945, to /s0/GigaData.9 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 36896 MS, Rate: 222.029, to /s1/GigaData.9 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 36909 MS, Rate: 221.951, to /s3/GigaData.9 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 36930 MS, Rate: 221.825, to /s2/GigaData.9 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 38086 MS, Rate: 215.092, to /s3/GigaData.10 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 38082 MS, Rate: 215.115, to /s1/GigaData.10 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 38079 MS, Rate: 215.132, to /s2/GigaData.10 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 38118 MS, Rate: 214.912, to /s0/GigaData.10 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 45715 MS, Rate: 179.197, to /s0/GigaData.11 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 45714 MS, Rate: 179.201, to /s3/GigaData.11 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 45709 MS, Rate: 179.221, to /s1/GigaData.11 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 45745 MS, Rate: 179.080, to /s2/GigaData.11 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 37772 MS, Rate: 216.880, to /s2/GigaData.12 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 37772 MS, Rate: 216.880, to /s0/GigaData.12 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 37957 MS, Rate: 215.823, to /s3/GigaData.12 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 37966 MS, Rate: 215.772, to /s1/GigaData.12 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 34264 MS, Rate: 239.085, to /s1/GigaData.13 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 39469 MS, Rate: 207.555, to /s0/GigaData.13 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 39469 MS, Rate: 207.555, to /s3/GigaData.13 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 39469 MS, Rate: 207.555, to /s2/GigaData.13 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 36136 MS, Rate: 226.699, to /s2/GigaData.14 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 36137 MS, Rate: 226.693, to /s0/GigaData.14 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 35995 MS, Rate: 227.587, to /s3/GigaData.14 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 36174 MS, Rate: 226.461, to /s1/GigaData.14 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 35942 MS, Rate: 227.923, to /s2/GigaData.15 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 38528 MS, Rate: 212.625, to /s0/GigaData.15 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 38529 MS, Rate: 212.619, to /s1/GigaData.15 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 38529 MS, Rate: 212.619, to /s3/GigaData.15 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 35233 MS, Rate: 232.509, to /s1/GigaData.16 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 39777 MS, Rate: 205.948, to /s0/GigaData.16 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 39784 MS, Rate: 205.912, to /s2/GigaData.16 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 39778 MS, Rate: 205.943, to /s3/GigaData.16 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 41775 MS, Rate: 196.098, to /s3/GigaData.17 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 41767 MS, Rate: 196.136, to /s1/GigaData.17 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 41775 MS, Rate: 196.098, to /s0/GigaData.17 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 41803 MS, Rate: 195.967, to /s2/GigaData.17 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 39528 MS, Rate: 207.245, to /s0/GigaData.18 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 39529 MS, Rate: 207.240, to /s1/GigaData.18 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 39529 MS, Rate: 207.240, to /s3/GigaData.18 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 39561 MS, Rate: 207.073, to /s2/GigaData.18 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 37776 MS, Rate: 216.857, to /s2/GigaData.19 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 37779 MS, Rate: 216.840, to /s1/GigaData.19 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 37805 MS, Rate: 216.691, to /s3/GigaData.19 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 37806 MS, Rate: 216.685, to /s0/GigaData.19 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 38755 MS, Rate: 211.379, to /s1/GigaData.20 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 48546 MS, Rate: 168.747, to /s3/GigaData.20 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 48544 MS, Rate: 168.754, to /s0/GigaData.20 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 48536 MS, Rate: 168.782, to /s2/GigaData.20 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 41471 MS, Rate: 197.536, to /s2/GigaData.21 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 41476 MS, Rate: 197.512, to /s3/GigaData.21 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 52133 MS, Rate: 157.137, to /s1/GigaData.21 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 52135 MS, Rate: 157.131, to /s0/GigaData.21 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 49764 MS, Rate: 164.617, to /s1/GigaData.22 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 49757 MS, Rate: 164.640, to /s3/GigaData.22 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 49761 MS, Rate: 164.627, to /s2/GigaData.22 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 49764 MS, Rate: 164.617, to /s0/GigaData.22 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 39496 MS, Rate: 207.413, to /s2/GigaData.23 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 39491 MS, Rate: 207.440, to /s1/GigaData.23 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 39493 MS, Rate: 207.429, to /s3/GigaData.23 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 39490 MS, Rate: 207.445, to /s0/GigaData.23 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 41942 MS, Rate: 195.317, to /s3/GigaData.24 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 41942 MS, Rate: 195.317, to /s0/GigaData.24 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 41977 MS, Rate: 195.154, to /s1/GigaData.24 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 41983 MS, Rate: 195.127, to /s2/GigaData.24 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 38669 MS, Rate: 211.849, to /s3/GigaData.25 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 38672 MS, Rate: 211.833, to /s2/GigaData.25 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 39293 MS, Rate: 208.485, to /s0/GigaData.25 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 39292 MS, Rate: 208.490, to /s1/GigaData.25 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 39109 MS, Rate: 209.466, to /s2/GigaData.26 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 40761 MS, Rate: 200.976, to /s3/GigaData.26 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 40765 MS, Rate: 200.957, to /s1/GigaData.26 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 40766 MS, Rate: 200.952, to /s0/GigaData.26 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 39390 MS, Rate: 207.972, to /s2/GigaData.27 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 39397 MS, Rate: 207.935, to /s3/GigaData.27 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 39393 MS, Rate: 207.956, to /s0/GigaData.27 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 39392 MS, Rate: 207.961, to /s1/GigaData.27 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 51472 MS, Rate: 159.154, to /s2/GigaData.28 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 51471 MS, Rate: 159.158, to /s3/GigaData.28 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 51474 MS, Rate: 159.148, to /s0/GigaData.28 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 51473 MS, Rate: 159.151, to /s1/GigaData.28 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 46326 MS, Rate: 176.834, to /s3/GigaData.29 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 46327 MS, Rate: 176.830, to /s2/GigaData.29 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 46329 MS, Rate: 176.822, to /s1/GigaData.29 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 46330 MS, Rate: 176.818, to /s0/GigaData.29 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 38183 MS, Rate: 214.546, to /s3/GigaData.30 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 38183 MS, Rate: 214.546, to /s2/GigaData.30 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 38812 MS, Rate: 211.069, to /s0/GigaData.30 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 38813 MS, Rate: 211.063, to /s1/GigaData.30 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 37634 MS, Rate: 217.676, to /s0/GigaData.31 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 37564 MS, Rate: 218.081, to /s2/GigaData.31 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 40069 MS, Rate: 204.447, to /s3/GigaData.31 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 40082 MS, Rate: 204.381, to /s1/GigaData.31 Fastest for filesystem s0 234.889 MB/s /s0/GigaData.8 Slowest for filesystem s0 140.464 MB/s /s0/GigaData.0 Fastest for filesystem s1 239.085 MB/s /s1/GigaData.13 Slowest for filesystem s1 140.454 MB/s /s1/GigaData.0 Fastest for filesystem s2 234.862 MB/s /s2/GigaData.8 Slowest for filesystem s2 140.462 MB/s /s2/GigaData.0 Fastest for filesystem s3 238.160 MB/s /s3/GigaData.8 Slowest for filesystem s3 140.380 MB/s /s3/GigaData.0 Max write speed (striped): 946.996 Min write speed (striped): 561.76 --------------060803090605060507010505 Content-Type: message/rfc822; name="xfs write speed regression 2.6.18.1 to 2.6.19.1.eml" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="xfs write speed regression 2.6.18.1 to 2.6.19.1.eml" Message-ID: <45951A1B.8070206@exegy.com> Date: Fri, 29 Dec 2006 07:37:31 -0600 From: "Mr. Berkley Shands" User-Agent: Thunderbird 1.5.0.9 (X11/20061222) MIME-Version: 1.0 To: xfs-masters@oss.sgi.com, linux-kernel@vger.kernel.org CC: Dave Lloyd Subject: xfs write speed regression 2.6.18.1 to 2.6.19.1 Content-Type: multipart/alternative; boundary="------------050406070902090104070606" This is a multi-part message in MIME format. --------------050406070902090104070606 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Write speeds have decreased 10% to 30% between 2.6.18.1 and 2.6.19.1. Read speeds are unchanged at 1.15 GB/Sec. Using a SuperMicro H8DC8 or Tyan 2895 or a Tyan 2915 (Socket F) with 4 2.2GHz Opterons, 16GB RAM, dual LSI 8408E SAS controllers into 4 X 4 Raid0s. XFS file system. The only difference is the Kernel Rev. 16 Seagate 7200.10 Sata drives, with 3.AAE firmware (very important!). LSI 8408E firmware rev is 1.02.01-0158. Adapter readahead is disabled. (Enabling readahead with this firmware costs 25% in write performance :-( ) Under 2.6.18.1, I/O peaks at 256.1 MB/Sec into each raid0 - 1GB/Sec. The average is 230 MB/Sec over the first TB. With 2.6.19.1, the peak is 220 MB/Sec, and the average is 170 MB/Sec. EXT3 runs 2-3X slower than XFS for this benchmark, so it is hard to see where the regression appeared. I'm not really too worried about it, but that much of a decrease is worth reporting. Since there were significant XFS changes between the revs, it might not be worthwhile for me to chase the exact update that causes this issue. berkley -- //E. F. Berkley Shands, MSc// **Exegy Inc.** 3668 S. Geyer Road, Suite 300 St. Louis, MO 63127 Direct: (314) 450-5348 Cell: (314) 303-2546 Office: (314) 450-5353 Fax: (314) 450-5354 This e-mail and any documents accompanying it may contain legally privileged and/or confidential information belonging to Exegy, Inc. Such information may be protected from disclosure by law. The information is intended for use by only the addressee. If you are not the intended recipient, you are hereby notified that any disclosure or use of the information is strictly prohibited. If you have received this e-mail in error, please immediately contact the sender by e-mail or phone regarding instructions for return or destruction and do not use or disclose the content to others. --------------050406070902090104070606 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Write speeds have decreased 10% to 30% between 2.6.18.1 and 2.6.19.1.
Read speeds are unchanged at 1.15 GB/Sec.

Using a SuperMicro H8DC8 or Tyan 2895 or a Tyan 2915 (Socket F)
with 4 2.2GHz Opterons, 16GB RAM, dual LSI 8408E SAS controllers
into 4 X 4 Raid0s. XFS file system. The only difference is the Kernel Rev.
16 Seagate 7200.10 Sata drives, with 3.AAE firmware (very important!).
LSI 8408E firmware rev is 1.02.01-0158. Adapter readahead is disabled.
(Enabling readahead with this firmware costs 25% in write performance :-( )

Under 2.6.18.1, I/O peaks at 256.1 MB/Sec into each raid0 - 1GB/Sec.
The average is 230 MB/Sec over the first TB. With 2.6.19.1,
the peak is 220 MB/Sec, and the average is 170 MB/Sec.

EXT3 runs 2-3X slower than XFS for this benchmark, so it is hard
to see where the regression appeared. I'm not really too worried
about it, but that much of a decrease is worth reporting.
Since there were significant XFS changes between the revs,
it might not be worthwhile for me to chase the exact update
that causes this issue.

berkley


 
--

E. F. Berkley Shands, MSc

Exegy Inc.

3668 S. Geyer Road, Suite 300

St. Louis, MO  63127

Direct:  (314) 450-5348

Cell:  (314) 303-2546

Office:  (314) 450-5353

Fax:  (314) 450-5354

 

This e-mail and any documents accompanying it may contain legally privileged and/or confidential information belonging to Exegy, Inc.  Such information may be protected from disclosure by law.  The information is intended for use by only the addressee.  If you are not the intended recipient, you are hereby notified that any disclosure or use of the information is strictly prohibited.  If you have received this e-mail in error, please immediately contact the sender by e-mail or phone regarding instructions for return or destruction and do not use or disclose the content to others.

 

--------------050406070902090104070606-- --------------060803090605060507010505-- From owner-xfs@oss.sgi.com Sat Jan 6 08:57:11 2007 Received: with ECARTIS (v1.0.0; list xfs); Sat, 06 Jan 2007 08:57:16 -0800 (PST) Received: from sandeen.net (sandeen.net [209.173.210.139]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l06GvAqw027991 for ; Sat, 6 Jan 2007 08:57:11 -0800 Received: from [10.0.0.4] (liberator.sandeen.net [10.0.0.4]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by sandeen.net (Postfix) with ESMTP id 3604B18011EB7; Sat, 6 Jan 2007 10:56:17 -0600 (CST) Message-ID: <459FD4B0.4000502@sandeen.net> Date: Sat, 06 Jan 2007 10:56:16 -0600 From: Eric Sandeen User-Agent: Thunderbird 1.5.0.9 (Macintosh/20061207) MIME-Version: 1.0 To: Dave Lloyd CC: linux-xfs@oss.sgi.com Subject: Re: [Fwd: [Fwd: xfs write speed regression 2.6.18.1 to 2.6.19.1]] References: <459ECF04.4090803@exegy.com> In-Reply-To: <459ECF04.4090803@exegy.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 10179 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: sandeen@sandeen.net Precedence: bulk X-list: xfs Content-Length: 242 Lines: 10 > Under 2.6.19.1 this rate drops to ~220MB/Sec. and the allocations are no > longer smooth > starting at the outside edges. What do you mean by "allocations are no longer smooth" Do you have any data showing the allocation changes? -Eric From owner-xfs@oss.sgi.com Sun Jan 7 13:38:42 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 07 Jan 2007 13:38:45 -0800 (PST) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l07Lccqw000700 for ; Sun, 7 Jan 2007 13:38:40 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id IAA10087; Mon, 8 Jan 2007 08:37:37 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l07Lba7Y86233986; Mon, 8 Jan 2007 08:37:36 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id l07LbZes86388742; Mon, 8 Jan 2007 08:37:35 +1100 (AEDT) Date: Mon, 8 Jan 2007 08:37:34 +1100 From: David Chinner To: linux-kernel Mailing List Cc: xfs@oss.sgi.com Subject: Re: xfs_file_ioctl / xfs_freeze: BUG: warning at kernel/mutex-debug.c:80/debug_mutex_unlock() Message-ID: <20070107213734.GS44411608@melbourne.sgi.com> References: <20070104001420.GA32440@m.safari.iki.fi> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070104001420.GA32440@m.safari.iki.fi> User-Agent: Mutt/1.4.2.1i X-archive-position: 10181 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Content-Length: 6087 Lines: 152 On Thu, Jan 04, 2007 at 02:14:21AM +0200, Sami Farin wrote: > just a simple test I did... > xfs_freeze -f /mnt/newtest > cp /etc/fstab /mnt/newtest > xfs_freeze -u /mnt/newtest > > 2007-01-04 01:44:30.341979500 <4>BUG: warning at kernel/mutex-debug.c:80/debug_mutex_unlock() > 2007-01-04 01:44:30.385771500 <4> [] dump_trace+0x215/0x21a > 2007-01-04 01:44:30.385774500 <4> [] show_trace_log_lvl+0x1a/0x30 > 2007-01-04 01:44:30.385775500 <4> [] show_trace+0x12/0x14 > 2007-01-04 01:44:30.385777500 <4> [] dump_stack+0x19/0x1b > 2007-01-04 01:44:30.385778500 <4> [] debug_mutex_unlock+0x69/0x120 > 2007-01-04 01:44:30.385779500 <4> [] __mutex_unlock_slowpath+0x44/0xf0 > 2007-01-04 01:44:30.385780500 <4> [] mutex_unlock+0x8/0xa > 2007-01-04 01:44:30.385782500 <4> [] thaw_bdev+0x57/0x6e > 2007-01-04 01:44:30.385791500 <4> [] xfs_ioctl+0x7ce/0x7d3 > 2007-01-04 01:44:30.385793500 <4> [] xfs_file_ioctl+0x33/0x54 > 2007-01-04 01:44:30.385794500 <4> [] do_ioctl+0x76/0x85 > 2007-01-04 01:44:30.385795500 <4> [] vfs_ioctl+0x59/0x1aa > 2007-01-04 01:44:30.385796500 <4> [] sys_ioctl+0x67/0x77 > 2007-01-04 01:44:30.385797500 <4> [] syscall_call+0x7/0xb > 2007-01-04 01:44:30.385799500 <4> [<001be410>] 0x1be410 > 2007-01-04 01:44:30.385800500 <4> ======================= > > fstab was there just fine after -u. Oh, that still hasn't been fixed? Generic bug, not XFS - the global semaphore->mutex cleanup converted the bd_mount_sem to a mutex, and mutexes complain loudly when a the process unlocking the mutex is not the process that locked it. Basically, the generic code is broken - the bd_mount_mutex needs to be reverted back to a semaphore because it is locked and unlocked by different processes. The following patch does this.... BTW, Sami, can you cc xfs@oss.sgi.com on XFS bug reports in future; you'll get more XFS savvy eyes there..... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group --- Revert bd_mount_mutex back to a semaphore so that xfs_freeze -f /mnt/newtest; xfs_freeze -u /mnt/newtest works safely and doesn't produce lockdep warnings. Signed-off-by: Dave Chinner --- fs/block_dev.c | 2 +- fs/buffer.c | 6 +++--- fs/gfs2/ops_fstype.c | 4 ++-- fs/super.c | 4 ++-- include/linux/fs.h | 2 +- 5 files changed, 9 insertions(+), 9 deletions(-) Index: 2.6.x-xfs-new/fs/block_dev.c =================================================================== --- 2.6.x-xfs-new.orig/fs/block_dev.c 2006-12-22 10:53:20.000000000 +1100 +++ 2.6.x-xfs-new/fs/block_dev.c 2007-01-08 08:26:15.843378600 +1100 @@ -263,7 +263,7 @@ static void init_once(void * foo, kmem_c { memset(bdev, 0, sizeof(*bdev)); mutex_init(&bdev->bd_mutex); - mutex_init(&bdev->bd_mount_mutex); + sema_init(&bdev->bd_mount_sem, 1); INIT_LIST_HEAD(&bdev->bd_inodes); INIT_LIST_HEAD(&bdev->bd_list); #ifdef CONFIG_SYSFS Index: 2.6.x-xfs-new/fs/buffer.c =================================================================== --- 2.6.x-xfs-new.orig/fs/buffer.c 2006-12-12 12:04:51.000000000 +1100 +++ 2.6.x-xfs-new/fs/buffer.c 2007-01-08 08:28:40.832542651 +1100 @@ -179,7 +179,7 @@ int fsync_bdev(struct block_device *bdev * freeze_bdev -- lock a filesystem and force it into a consistent state * @bdev: blockdevice to lock * - * This takes the block device bd_mount_mutex to make sure no new mounts + * This takes the block device bd_mount_sem to make sure no new mounts * happen on bdev until thaw_bdev() is called. * If a superblock is found on this device, we take the s_umount semaphore * on it to make sure nobody unmounts until the snapshot creation is done. @@ -188,7 +188,7 @@ struct super_block *freeze_bdev(struct b { struct super_block *sb; - mutex_lock(&bdev->bd_mount_mutex); + down(&bdev->bd_mount_sem); sb = get_super(bdev); if (sb && !(sb->s_flags & MS_RDONLY)) { sb->s_frozen = SB_FREEZE_WRITE; @@ -230,7 +230,7 @@ void thaw_bdev(struct block_device *bdev drop_super(sb); } - mutex_unlock(&bdev->bd_mount_mutex); + up(&bdev->bd_mount_sem); } EXPORT_SYMBOL(thaw_bdev); Index: 2.6.x-xfs-new/fs/gfs2/ops_fstype.c =================================================================== --- 2.6.x-xfs-new.orig/fs/gfs2/ops_fstype.c 2006-12-12 12:04:58.000000000 +1100 +++ 2.6.x-xfs-new/fs/gfs2/ops_fstype.c 2007-01-08 08:27:12.847973663 +1100 @@ -867,9 +867,9 @@ static int gfs2_get_sb_meta(struct file_ error = -EBUSY; goto error; } - mutex_lock(&sb->s_bdev->bd_mount_mutex); + down(&sb->s_bdev->bd_mount_sem); new = sget(fs_type, test_bdev_super, set_bdev_super, sb->s_bdev); - mutex_unlock(&sb->s_bdev->bd_mount_mutex); + up(&sb->s_bdev->bd_mount_sem); if (IS_ERR(new)) { error = PTR_ERR(new); goto error; Index: 2.6.x-xfs-new/fs/super.c =================================================================== --- 2.6.x-xfs-new.orig/fs/super.c 2006-12-22 11:45:59.000000000 +1100 +++ 2.6.x-xfs-new/fs/super.c 2007-01-08 08:24:20.718330640 +1100 @@ -736,9 +736,9 @@ int get_sb_bdev(struct file_system_type * will protect the lockfs code from trying to start a snapshot * while we are mounting */ - mutex_lock(&bdev->bd_mount_mutex); + down(&bdev->bd_mount_sem); s = sget(fs_type, test_bdev_super, set_bdev_super, bdev); - mutex_unlock(&bdev->bd_mount_mutex); + up(&bdev->bd_mount_sem); if (IS_ERR(s)) goto error_s; Index: 2.6.x-xfs-new/include/linux/fs.h =================================================================== --- 2.6.x-xfs-new.orig/include/linux/fs.h 2006-12-12 12:06:31.000000000 +1100 +++ 2.6.x-xfs-new/include/linux/fs.h 2007-01-08 08:24:53.602060200 +1100 @@ -456,7 +456,7 @@ struct block_device { struct inode * bd_inode; /* will die */ int bd_openers; struct mutex bd_mutex; /* open/close mutex */ - struct mutex bd_mount_mutex; /* mount mutex */ + struct semaphore bd_mount_sem; struct list_head bd_inodes; void * bd_holder; int bd_holders; From owner-xfs@oss.sgi.com Sun Jan 7 14:24:54 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 07 Jan 2007 14:25:00 -0800 (PST) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l07MOoqw008067 for ; Sun, 7 Jan 2007 14:24:53 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA11204; Mon, 8 Jan 2007 09:23:49 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l07MNk7Y86439344; Mon, 8 Jan 2007 09:23:46 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id l07MNf2H82495547; Mon, 8 Jan 2007 09:23:41 +1100 (AEDT) Date: Mon, 8 Jan 2007 09:23:41 +1100 From: David Chinner To: Hugh Dickins Cc: Sami Farin <7atbggg02@sneakemail.com>, Nathan Scott , xfs@oss.sgi.com, Nick Piggin , linux-kernel@vger.kernel.org Subject: Re: BUG: warning at mm/truncate.c:60/cancel_dirty_page() Message-ID: <20070107222341.GT33919298@melbourne.sgi.com> References: <20070106023907.GA7766@m.safari.iki.fi> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i X-archive-position: 10182 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Content-Length: 4187 Lines: 94 On Sat, Jan 06, 2007 at 09:11:07PM +0000, Hugh Dickins wrote: > On Sat, 6 Jan 2007, Sami Farin wrote: > > > Linux 2.6.19.1 SMP [2] on Pentium D... > > I was running dt-15.14 [2] and I ran > > "cinfo datafile" (it does mincore()). > > Well it went OK but when I ran "strace cinfo datafile"...: > > 04:18:48.062466 mincore(0x37f1f000, 2147266560, > > You rightly noted in a followup that there have been changes to > mincore, but I doubt they have any bearing on this: I think the > BUG just happened at the same time as your mincore. > > > ... > > 2007-01-06 04:19:03.788181500 <4>BUG: warning at mm/truncate.c:60/cancel_dirty_page() > > 2007-01-06 04:19:03.788221500 <4> [] dump_trace+0x215/0x21a > > 2007-01-06 04:19:03.788223500 <4> [] show_trace_log_lvl+0x1a/0x30 > > 2007-01-06 04:19:03.788224500 <4> [] show_trace+0x12/0x14 > > 2007-01-06 04:19:03.788225500 <4> [] dump_stack+0x19/0x1b > > 2007-01-06 04:19:03.788227500 <4> [] cancel_dirty_page+0x7e/0x80 > > 2007-01-06 04:19:03.788228500 <4> [] truncate_complete_page+0x1a/0x47 > > 2007-01-06 04:19:03.788229500 <4> [] truncate_inode_pages_range+0x114/0x2ae > > 2007-01-06 04:19:03.788245500 <4> [] truncate_inode_pages+0x1a/0x1c > > 2007-01-06 04:19:03.788247500 <4> [] fs_flushinval_pages+0x40/0x77 > > 2007-01-06 04:19:03.788248500 <4> [] xfs_write+0x8c4/0xb68 > > 2007-01-06 04:19:03.788250500 <4> [] xfs_file_aio_write+0x7e/0x95 > > 2007-01-06 04:19:03.788251500 <4> [] do_sync_write+0xca/0x119 > > 2007-01-06 04:19:03.788265500 <4> [] vfs_write+0x187/0x18c > > 2007-01-06 04:19:03.788267500 <4> [] sys_write+0x3d/0x64 > > 2007-01-06 04:19:03.788268500 <4> [] syscall_call+0x7/0xb > > 2007-01-06 04:19:03.788269500 <4> [<001cf410>] 0x1cf410 > > 2007-01-06 04:19:03.788289500 <4> ======================= > > So... XFS uses truncate_inode_pages when serving the write system call. Only when you are doing direct I/O. XFS does direct writes without the i_mutex held, so it has to invalidate the range of cached pages while holding it's own locks to ensure direct I/O cache semantics are kept. > That's very inventive, Not really - been doing it for years. > and now it looks like Linus' cancel_dirty_page > and new warning have caught it out. VM people expect it to be called > either when freeing an inode no longer in use, or when doing a truncate, > after ensuring that all pages mapped into userspace have been taken out. Ok, so we are punching a hole in the middle of the address space because we are doing direct I/O on it and need to invalidate the cache. How are you supposed to invalidate a range of pages in a mapping for this case, then? invalidate_mapping_pages() would appear to be the candidate (the generic code uses this), but it _skips_ pages that are already mapped. invalidate_mapping_pages() then advises you to use truncate_inode_pages(): /** * invalidate_mapping_pages - Invalidate all the unlocked pages of one inode * @mapping: the address_space which holds the pages to invalidate * @start: the offset 'from' which to invalidate * @end: the offset 'to' which to invalidate (inclusive) * * This function only removes the unlocked pages, if you want to * remove all the pages of one inode, you must call truncate_inode_pages. * * invalidate_mapping_pages() will not block on IO activity. It will not * invalidate pages which are dirty, locked, under writeback or mapped into * pagetables. */ We want to remove all pages within the range given, so, as directed by the comment here, we use truncate_inode_pages(). Says nothing about mappings needing to be removed first so I guess that's where we've been caught..... I think we can use invalidate_inode_pages2_range(), but that doesn't handle partial page invalidations. I think this will be ok, but it's going to need some serious fsx testing on blocksize != page size configs. So, am I correct in assuming we should be calling invalidate_inode_pages2_range() instead of truncate_inode_pages()? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Sun Jan 7 14:49:08 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 07 Jan 2007 14:49:14 -0800 (PST) Received: from smtp.osdl.org (smtp.osdl.org [65.172.181.24]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l07Mn7qw012412 for ; Sun, 7 Jan 2007 14:49:08 -0800 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id l07MmCWi005515 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Sun, 7 Jan 2007 14:48:13 -0800 Received: from box (shell0.pdx.osdl.net [10.9.0.31]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with SMTP id l07MmCQm026914; Sun, 7 Jan 2007 14:48:12 -0800 Date: Sun, 7 Jan 2007 14:48:12 -0800 From: Andrew Morton To: David Chinner Cc: Hugh Dickins , Sami Farin <7atbggg02@sneakemail.com>, Nathan Scott , xfs@oss.sgi.com, Nick Piggin , linux-kernel@vger.kernel.org Subject: Re: BUG: warning at mm/truncate.c:60/cancel_dirty_page() Message-Id: <20070107144812.96357ff9.akpm@osdl.org> In-Reply-To: <20070107222341.GT33919298@melbourne.sgi.com> References: <20070106023907.GA7766@m.safari.iki.fi> <20070107222341.GT33919298@melbourne.sgi.com> X-Mailer: Sylpheed version 2.2.7 (GTK+ 2.8.17; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.167 $ X-Scanned-By: MIMEDefang 2.36 X-archive-position: 10183 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: akpm@osdl.org Precedence: bulk X-list: xfs Content-Length: 476 Lines: 15 On Mon, 8 Jan 2007 09:23:41 +1100 David Chinner wrote: > How are you supposed to invalidate a range of pages in a mapping for > this case, then? invalidate_mapping_pages() would appear to be the > candidate (the generic code uses this), but it _skips_ pages that > are already mapped. unmap_mapping_range()? > So, am I correct in assuming we should be calling invalidate_inode_pages2_range() > instead of truncate_inode_pages()? That would be conventional. From owner-xfs@oss.sgi.com Sun Jan 7 15:05:48 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 07 Jan 2007 15:05:54 -0800 (PST) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l07N5jqw015757 for ; Sun, 7 Jan 2007 15:05:47 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA12109; Mon, 8 Jan 2007 10:04:43 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l07N4f7Y86367460; Mon, 8 Jan 2007 10:04:41 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id l07N4ax085800729; Mon, 8 Jan 2007 10:04:36 +1100 (AEDT) Date: Mon, 8 Jan 2007 10:04:36 +1100 From: David Chinner To: Andrew Morton Cc: David Chinner , Hugh Dickins , Sami Farin <7atbggg02@sneakemail.com>, xfs@oss.sgi.com, Nick Piggin , linux-kernel@vger.kernel.org Subject: Re: BUG: warning at mm/truncate.c:60/cancel_dirty_page() Message-ID: <20070107230436.GU33919298@melbourne.sgi.com> References: <20070106023907.GA7766@m.safari.iki.fi> <20070107222341.GT33919298@melbourne.sgi.com> <20070107144812.96357ff9.akpm@osdl.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070107144812.96357ff9.akpm@osdl.org> User-Agent: Mutt/1.4.2.1i X-archive-position: 10184 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Content-Length: 2018 Lines: 66 On Sun, Jan 07, 2007 at 02:48:12PM -0800, Andrew Morton wrote: > On Mon, 8 Jan 2007 09:23:41 +1100 > David Chinner wrote: > > > How are you supposed to invalidate a range of pages in a mapping for > > this case, then? invalidate_mapping_pages() would appear to be the > > candidate (the generic code uses this), but it _skips_ pages that > > are already mapped. > > unmap_mapping_range()? /me looks at how it's used in invalidate_inode_pages2_range() and decides it's easier not to call this directly. > > So, am I correct in assuming we should be calling invalidate_inode_pages2_range() > > instead of truncate_inode_pages()? > > That would be conventional. .... in that case the following patch should fix the warning: --- fs/xfs/linux-2.6/xfs_fs_subr.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_fs_subr.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_fs_subr.c 2006-12-12 12:05:17.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_fs_subr.c 2007-01-08 09:30:22.056571711 +1100 @@ -21,6 +21,8 @@ int fs_noerr(void) { return 0; } int fs_nosys(void) { return ENOSYS; } void fs_noval(void) { return; } +#define XFS_OFF_TO_PCSIZE(off) \ + (((off) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT) void fs_tosspages( bhv_desc_t *bdp, @@ -32,7 +34,9 @@ fs_tosspages( struct inode *ip = vn_to_inode(vp); if (VN_CACHED(vp)) - truncate_inode_pages(ip->i_mapping, first); + invalidate_inode_pages2_range(ip->i_mapping, + XFS_OFF_TO_PCSIZE(first), + XFS_OFF_TO_PCSIZE(last)); } void @@ -49,7 +53,9 @@ fs_flushinval_pages( if (VN_TRUNC(vp)) VUNTRUNCATE(vp); filemap_write_and_wait(ip->i_mapping); - truncate_inode_pages(ip->i_mapping, first); + invalidate_inode_pages2_range(ip->i_mapping, + XFS_OFF_TO_PCSIZE(first), + XFS_OFF_TO_PCSIZE(last)); } } -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Sun Jan 7 15:15:08 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 07 Jan 2007 15:15:13 -0800 (PST) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l07NF6qw017480 for ; Sun, 7 Jan 2007 15:15:07 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA12240; Mon, 8 Jan 2007 10:14:08 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l07NE67Y84013920; Mon, 8 Jan 2007 10:14:07 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id l07NE26U86531861; Mon, 8 Jan 2007 10:14:02 +1100 (AEDT) Date: Mon, 8 Jan 2007 10:14:02 +1100 From: David Chinner To: Haar =?iso-8859-1?Q?J=E1nos?= Cc: David Chinner , linux-xfs@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: xfslogd-spinlock bug? Message-ID: <20070107231402.GU44411608@melbourne.sgi.com> References: <000d01c72127$3d7509b0$0400a8c0@dcccs> <20061217224457.GN33919298@melbourne.sgi.com> <026501c72237$0464f7a0$0400a8c0@dcccs> <20061218062444.GH44411608@melbourne.sgi.com> <027b01c7227d$0e26d1f0$0400a8c0@dcccs> <20061218223637.GP44411608@melbourne.sgi.com> <001a01c722fd$df5ca710$0400a8c0@dcccs> <20061219025229.GT33919298@melbourne.sgi.com> <20061219044700.GW33919298@melbourne.sgi.com> <041601c729b6$f81e4af0$0400a8c0@dcccs> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <041601c729b6$f81e4af0$0400a8c0@dcccs> User-Agent: Mutt/1.4.2.1i X-archive-position: 10185 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Content-Length: 1910 Lines: 53 On Wed, Dec 27, 2006 at 01:58:06PM +0100, Haar János wrote: > Hello, > > ----- Original Message ----- > From: "David Chinner" > To: "David Chinner" > Cc: "Haar János" ; ; > > Sent: Tuesday, December 19, 2006 5:47 AM > Subject: Re: xfslogd-spinlock bug? > > > > On Tue, Dec 19, 2006 at 01:52:29PM +1100, David Chinner wrote: > > > > The filesystem was being shutdown so xfs_inode_item_destroy() just > > frees the inode log item without removing it from the AIL. I'll fix that, > > and see if i have any luck.... > > > > So I'd still try that patch i sent in the previous email... > > I still using the patch, but didnt shows any messages at this point. > > I'v got 3 crash/reboot, but 2 causes nbd disconneted, and this one: > > Dec 27 13:41:29 dy-base BUG: warning at > kernel/mutex.c:220/__mutex_unlock_common_slowpath() > Dec 27 13:41:29 dy-base Unable to handle kernel paging request at > 0000000066604480 RIP: > Dec 27 13:41:29 dy-base [] resched_task+0x12/0x64 > Dec 27 13:41:29 dy-base PGD 115246067 PUD 0 > Dec 27 13:41:29 dy-base Oops: 0000 [1] SMP > Dec 27 13:41:29 dy-base CPU 1 > Dec 27 13:41:29 dy-base Modules linked in: nbd rd netconsole e1000 video > Dec 27 13:41:29 dy-base Pid: 4069, comm: httpd Not tainted 2.6.19 #3 > Dec 27 13:41:29 dy-base RIP: 0010:[] [] > resched_task+0x12/0x64 > Dec 27 13:41:29 dy-base RSP: 0018:ffff810105c01b78 EFLAGS: 00010083 > Dec 27 13:41:29 dy-base RAX: ffffffff807d5800 RBX: 00001749fd97c214 RCX: Different corruption in RBX here. Looks like semi-random garbage there. I wonder - what's the mac and ip address(es) of your machine and nbd servers? (i.e. I suspect this is a nbd problem, not an XFS problem) Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Sun Jan 7 15:34:39 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 07 Jan 2007 15:34:43 -0800 (PST) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l07NYaqw020499 for ; Sun, 7 Jan 2007 15:34:37 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA12707; Mon, 8 Jan 2007 10:33:37 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l07NXa7Y83348863; Mon, 8 Jan 2007 10:33:36 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id l07NXYaI86362939; Mon, 8 Jan 2007 10:33:34 +1100 (AEDT) Date: Mon, 8 Jan 2007 10:33:34 +1100 From: David Chinner To: Dave Lloyd Cc: linux-xfs@oss.sgi.com Subject: Re: [Fwd: [Fwd: xfs write speed regression 2.6.18.1 to 2.6.19.1]] Message-ID: <20070107233334.GW44411608@melbourne.sgi.com> References: <459ECF04.4090803@exegy.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <459ECF04.4090803@exegy.com> User-Agent: Mutt/1.4.2.1i X-archive-position: 10186 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Content-Length: 1186 Lines: 41 On Fri, Jan 05, 2007 at 04:19:48PM -0600, Dave Lloyd wrote: > From a co-worker. Anyone know what might have changed this between > 2.6.18 and 2.6.19 when the issue first appeared? IIRC< a bunch of changes went into the generic buffered I/O path to fix deadlocks on writes if we take a page fault during the copyin. That caused a performance regression for buffered I/O of around that sort of figure, and the regression is slowly being fixed up as per: > under 2.6.20-rc3 the speeds have gone back up some, but they are 10% > slower than 2.6.18. So I don't think this is an XFS problem as such. Still, I will try to do some local tests to check it out. > and the allocations, as shown by the sequential writes (attached) are > random. ???? > If I went all the way out to the inside tracks, you would be at about > 490MB/Sec. > > Something changed. 2.6.19 was unstable, with XFS panics on a regular basis. Got any stack traces? > 2.6.19.1 has not had an error yet.. (knock head on wall repeatedly). We didn't push any changes into 2.6.19.1, so that implies bugs in the generic code, not XFS.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Sun Jan 7 20:04:10 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 07 Jan 2007 20:04:16 -0800 (PST) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l08447qw032484 for ; Sun, 7 Jan 2007 20:04:09 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA19029; Mon, 8 Jan 2007 15:03:11 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l0843A7Y85711269; Mon, 8 Jan 2007 15:03:10 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id l08439wn86331993; Mon, 8 Jan 2007 15:03:09 +1100 (AEDT) Date: Mon, 8 Jan 2007 15:03:09 +1100 From: David Chinner To: xfs-dev@sgi.com Cc: xfs@oss.sgi.com Subject: Review: fix mapping invalidation callouts Message-ID: <20070108040309.GX33919298@melbourne.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-archive-position: 10187 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Content-Length: 2150 Lines: 69 With the recent cancel_dirty_page() changes, a warning was added if we cancel a dirty page that is still mapped into the page tables. This happens in XFS from fs_tosspages() and fs_flushinval_pages() because they call truncate_inode_pages(). truncate_inode_pages() does not invalidate existing page mappings; it is expected taht this is called only when truncating the file or destroying the inode and on both these cases there can be no mapped ptes. However, we call this when doing direct I/O writes to remove pages from the page cache. As a result, we can rip a page from the page cache that still has mappings attached. The correct fix is to use invalidate_inode_pages2_range() instead of truncate_inode_pages(). They essentially do the same thing, but the former also removes any pte mappings before removing the page from the page cache. Comments? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group --- fs/xfs/linux-2.6/xfs_fs_subr.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_fs_subr.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_fs_subr.c 2006-12-12 12:05:17.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_fs_subr.c 2007-01-08 09:30:22.056571711 +1100 @@ -21,6 +21,8 @@ int fs_noerr(void) { return 0; } int fs_nosys(void) { return ENOSYS; } void fs_noval(void) { return; } +#define XFS_OFF_TO_PCSIZE(off) \ + (((off) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT) void fs_tosspages( bhv_desc_t *bdp, @@ -32,7 +34,9 @@ fs_tosspages( struct inode *ip = vn_to_inode(vp); if (VN_CACHED(vp)) - truncate_inode_pages(ip->i_mapping, first); + invalidate_inode_pages2_range(ip->i_mapping, + XFS_OFF_TO_PCSIZE(first), + XFS_OFF_TO_PCSIZE(last)); } void @@ -49,7 +53,9 @@ fs_flushinval_pages( if (VN_TRUNC(vp)) VUNTRUNCATE(vp); filemap_write_and_wait(ip->i_mapping); - truncate_inode_pages(ip->i_mapping, first); + invalidate_inode_pages2_range(ip->i_mapping, + XFS_OFF_TO_PCSIZE(first), + XFS_OFF_TO_PCSIZE(last)); } } From owner-xfs@oss.sgi.com Sun Jan 7 20:45:28 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 07 Jan 2007 20:45:32 -0800 (PST) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l084jHqw010322 for ; Sun, 7 Jan 2007 20:45:19 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA20144; Mon, 8 Jan 2007 15:44:16 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l084iF7Y86375981; Mon, 8 Jan 2007 15:44:16 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id l084iEUf86564446; Mon, 8 Jan 2007 15:44:14 +1100 (AEDT) Date: Mon, 8 Jan 2007 15:44:14 +1100 From: David Chinner To: xfs-dev@sgi.com Cc: xfs@oss.sgi.com Subject: Review: make growing by >2TB work Message-ID: <20070108044414.GC44411608@melbourne.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-archive-position: 10188 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Content-Length: 14997 Lines: 414 Growing a filesystem by > 2TB currently causes an overflow in the transaction subsystem. Make transaction deltas and associated elements explicitly 64 bit types so that we don't get overflows. Comments? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group --- fs/xfs/xfs_bmap.c | 26 +++++++++++++------------- fs/xfs/xfs_mount.c | 18 ++++++------------ fs/xfs/xfs_mount.h | 7 ++++--- fs/xfs/xfs_trans.c | 32 ++++++++++++++++---------------- fs/xfs/xfs_trans.h | 42 +++++++++++++++++++++--------------------- 5 files changed, 60 insertions(+), 65 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.c 2006-12-04 11:25:57.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_mount.c 2006-12-04 11:33:31.470695330 +1100 @@ -52,11 +52,11 @@ STATIC void xfs_unmountfs_wait(xfs_mount #ifdef HAVE_PERCPU_SB STATIC void xfs_icsb_destroy_counters(xfs_mount_t *); -STATIC void xfs_icsb_balance_counter(xfs_mount_t *, xfs_sb_field_t, int, -int); +STATIC void xfs_icsb_balance_counter(xfs_mount_t *, xfs_sb_field_t, + int, int); STATIC void xfs_icsb_sync_counters(xfs_mount_t *); STATIC int xfs_icsb_modify_counters(xfs_mount_t *, xfs_sb_field_t, - int, int); + int64_t, int); STATIC int xfs_icsb_disable_counter(xfs_mount_t *, xfs_sb_field_t); #else @@ -136,14 +136,9 @@ xfs_mount_init(void) mp->m_flags |= XFS_MOUNT_NO_PERCPU_SB; } - AIL_LOCKINIT(&mp->m_ail_lock, "xfs_ail"); spinlock_init(&mp->m_sb_lock, "xfs_sb"); mutex_init(&mp->m_ilock); initnsema(&mp->m_growlock, 1, "xfs_grow"); - /* - * Initialize the AIL. - */ - xfs_trans_ail_init(mp); atomic_set(&mp->m_active_trans, 0); @@ -1255,7 +1250,7 @@ xfs_mod_sb(xfs_trans_t *tp, __int64_t fi */ int xfs_mod_incore_sb_unlocked(xfs_mount_t *mp, xfs_sb_field_t field, - int delta, int rsvd) + int64_t delta, int rsvd) { int scounter; /* short counter for 32 bit fields */ long long lcounter; /* long counter for 64 bit fields */ @@ -1287,7 +1282,6 @@ xfs_mod_incore_sb_unlocked(xfs_mount_t * mp->m_sb.sb_ifree = lcounter; return 0; case XFS_SBS_FDBLOCKS: - lcounter = (long long) mp->m_sb.sb_fdblocks - XFS_ALLOC_SET_ASIDE(mp); res_used = (long long)(mp->m_resblks - mp->m_resblks_avail); @@ -1418,7 +1412,7 @@ xfs_mod_incore_sb_unlocked(xfs_mount_t * * routine to do the work. */ int -xfs_mod_incore_sb(xfs_mount_t *mp, xfs_sb_field_t field, int delta, int rsvd) +xfs_mod_incore_sb(xfs_mount_t *mp, xfs_sb_field_t field, int64_t delta, int rsvd) { unsigned long s; int status; @@ -2091,7 +2085,7 @@ int xfs_icsb_modify_counters( xfs_mount_t *mp, xfs_sb_field_t field, - int delta, + int64_t delta, int rsvd) { xfs_icsb_cnts_t *icsbp; Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.h 2006-12-04 11:25:57.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_mount.h 2006-12-04 11:33:31.470695330 +1100 @@ -565,10 +565,11 @@ xfs_daddr_to_agbno(struct xfs_mount *mp, /* * This structure is for use by the xfs_mod_incore_sb_batch() routine. + * xfs_growfs can specify a few fields which are more than int limit */ typedef struct xfs_mod_sb { xfs_sb_field_t msb_field; /* Field to modify, see below */ - int msb_delta; /* Change to make to specified field */ + int64_t msb_delta; /* Change to make to specified field */ } xfs_mod_sb_t; #define XFS_MOUNT_ILOCK(mp) mutex_lock(&((mp)->m_ilock)) @@ -586,9 +587,9 @@ extern int xfs_unmountfs(xfs_mount_t *, extern void xfs_unmountfs_close(xfs_mount_t *, struct cred *); extern int xfs_unmountfs_writesb(xfs_mount_t *); extern int xfs_unmount_flush(xfs_mount_t *, int); -extern int xfs_mod_incore_sb(xfs_mount_t *, xfs_sb_field_t, int, int); +extern int xfs_mod_incore_sb(xfs_mount_t *, xfs_sb_field_t, int64_t, int); extern int xfs_mod_incore_sb_unlocked(xfs_mount_t *, xfs_sb_field_t, - int, int); + int64_t, int); extern int xfs_mod_incore_sb_batch(xfs_mount_t *, xfs_mod_sb_t *, uint, int); extern struct xfs_buf *xfs_getsb(xfs_mount_t *, int); Index: 2.6.x-xfs-new/fs/xfs/xfs_trans.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_trans.c 2006-12-04 11:25:38.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_trans.c 2006-12-04 11:33:31.470695330 +1100 @@ -339,7 +339,7 @@ xfs_trans_reserve( */ if (blocks > 0) { error = xfs_mod_incore_sb(tp->t_mountp, XFS_SBS_FDBLOCKS, - -blocks, rsvd); + -((int64_t)blocks), rsvd); if (error != 0) { current_restore_flags_nested(&tp->t_pflags, PF_FSTRANS); return (XFS_ERROR(ENOSPC)); @@ -380,7 +380,7 @@ xfs_trans_reserve( */ if (rtextents > 0) { error = xfs_mod_incore_sb(tp->t_mountp, XFS_SBS_FREXTENTS, - -rtextents, rsvd); + -((int64_t)rtextents), rsvd); if (error) { error = XFS_ERROR(ENOSPC); goto undo_log; @@ -410,7 +410,7 @@ undo_log: undo_blocks: if (blocks > 0) { (void) xfs_mod_incore_sb(tp->t_mountp, XFS_SBS_FDBLOCKS, - blocks, rsvd); + (int64_t)blocks, rsvd); tp->t_blk_res = 0; } @@ -432,7 +432,7 @@ void xfs_trans_mod_sb( xfs_trans_t *tp, uint field, - long delta) + int64_t delta) { switch (field) { @@ -663,62 +663,62 @@ xfs_trans_unreserve_and_mod_sb( if (tp->t_flags & XFS_TRANS_SB_DIRTY) { if (tp->t_icount_delta != 0) { msbp->msb_field = XFS_SBS_ICOUNT; - msbp->msb_delta = (int)tp->t_icount_delta; + msbp->msb_delta = tp->t_icount_delta; msbp++; } if (tp->t_ifree_delta != 0) { msbp->msb_field = XFS_SBS_IFREE; - msbp->msb_delta = (int)tp->t_ifree_delta; + msbp->msb_delta = tp->t_ifree_delta; msbp++; } if (tp->t_fdblocks_delta != 0) { msbp->msb_field = XFS_SBS_FDBLOCKS; - msbp->msb_delta = (int)tp->t_fdblocks_delta; + msbp->msb_delta = tp->t_fdblocks_delta; msbp++; } if (tp->t_frextents_delta != 0) { msbp->msb_field = XFS_SBS_FREXTENTS; - msbp->msb_delta = (int)tp->t_frextents_delta; + msbp->msb_delta = tp->t_frextents_delta; msbp++; } if (tp->t_dblocks_delta != 0) { msbp->msb_field = XFS_SBS_DBLOCKS; - msbp->msb_delta = (int)tp->t_dblocks_delta; + msbp->msb_delta = tp->t_dblocks_delta; msbp++; } if (tp->t_agcount_delta != 0) { msbp->msb_field = XFS_SBS_AGCOUNT; - msbp->msb_delta = (int)tp->t_agcount_delta; + msbp->msb_delta = tp->t_agcount_delta; msbp++; } if (tp->t_imaxpct_delta != 0) { msbp->msb_field = XFS_SBS_IMAX_PCT; - msbp->msb_delta = (int)tp->t_imaxpct_delta; + msbp->msb_delta = tp->t_imaxpct_delta; msbp++; } if (tp->t_rextsize_delta != 0) { msbp->msb_field = XFS_SBS_REXTSIZE; - msbp->msb_delta = (int)tp->t_rextsize_delta; + msbp->msb_delta = tp->t_rextsize_delta; msbp++; } if (tp->t_rbmblocks_delta != 0) { msbp->msb_field = XFS_SBS_RBMBLOCKS; - msbp->msb_delta = (int)tp->t_rbmblocks_delta; + msbp->msb_delta = tp->t_rbmblocks_delta; msbp++; } if (tp->t_rblocks_delta != 0) { msbp->msb_field = XFS_SBS_RBLOCKS; - msbp->msb_delta = (int)tp->t_rblocks_delta; + msbp->msb_delta = tp->t_rblocks_delta; msbp++; } if (tp->t_rextents_delta != 0) { msbp->msb_field = XFS_SBS_REXTENTS; - msbp->msb_delta = (int)tp->t_rextents_delta; + msbp->msb_delta = tp->t_rextents_delta; msbp++; } if (tp->t_rextslog_delta != 0) { msbp->msb_field = XFS_SBS_REXTSLOG; - msbp->msb_delta = (int)tp->t_rextslog_delta; + msbp->msb_delta = tp->t_rextslog_delta; msbp++; } } Index: 2.6.x-xfs-new/fs/xfs/xfs_trans.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_trans.h 2006-12-04 11:25:57.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_trans.h 2006-12-04 11:33:31.474694802 +1100 @@ -350,25 +350,25 @@ typedef struct xfs_trans { xfs_trans_callback_t t_callback; /* transaction callback */ void *t_callarg; /* callback arg */ unsigned int t_flags; /* misc flags */ - long t_icount_delta; /* superblock icount change */ - long t_ifree_delta; /* superblock ifree change */ - long t_fdblocks_delta; /* superblock fdblocks chg */ - long t_res_fdblocks_delta; /* on-disk only chg */ - long t_frextents_delta;/* superblock freextents chg*/ - long t_res_frextents_delta; /* on-disk only chg */ + int64_t t_icount_delta; /* superblock icount change */ + int64_t t_ifree_delta; /* superblock ifree change */ + int64_t t_fdblocks_delta; /* superblock fdblocks chg */ + int64_t t_res_fdblocks_delta; /* on-disk only chg */ + int64_t t_frextents_delta;/* superblock freextents chg*/ + int64_t t_res_frextents_delta; /* on-disk only chg */ #ifdef DEBUG - long t_ag_freeblks_delta; /* debugging counter */ - long t_ag_flist_delta; /* debugging counter */ - long t_ag_btree_delta; /* debugging counter */ + int64_t t_ag_freeblks_delta; /* debugging counter */ + int64_t t_ag_flist_delta; /* debugging counter */ + int64_t t_ag_btree_delta; /* debugging counter */ #endif - long t_dblocks_delta;/* superblock dblocks change */ - long t_agcount_delta;/* superblock agcount change */ - long t_imaxpct_delta;/* superblock imaxpct change */ - long t_rextsize_delta;/* superblock rextsize chg */ - long t_rbmblocks_delta;/* superblock rbmblocks chg */ - long t_rblocks_delta;/* superblock rblocks change */ - long t_rextents_delta;/* superblocks rextents chg */ - long t_rextslog_delta;/* superblocks rextslog chg */ + int64_t t_dblocks_delta;/* superblock dblocks change */ + int64_t t_agcount_delta;/* superblock agcount change */ + int64_t t_imaxpct_delta;/* superblock imaxpct change */ + int64_t t_rextsize_delta;/* superblock rextsize chg */ + int64_t t_rbmblocks_delta;/* superblock rbmblocks chg */ + int64_t t_rblocks_delta;/* superblock rblocks change */ + int64_t t_rextents_delta;/* superblocks rextents chg */ + int64_t t_rextslog_delta;/* superblocks rextslog chg */ unsigned int t_items_free; /* log item descs free */ xfs_log_item_chunk_t t_items; /* first log item desc chunk */ xfs_trans_header_t t_header; /* header for in-log trans */ @@ -932,9 +932,9 @@ typedef struct xfs_trans { #define xfs_trans_set_sync(tp) ((tp)->t_flags |= XFS_TRANS_SYNC) #ifdef DEBUG -#define xfs_trans_agblocks_delta(tp, d) ((tp)->t_ag_freeblks_delta += (long)d) -#define xfs_trans_agflist_delta(tp, d) ((tp)->t_ag_flist_delta += (long)d) -#define xfs_trans_agbtree_delta(tp, d) ((tp)->t_ag_btree_delta += (long)d) +#define xfs_trans_agblocks_delta(tp, d) ((tp)->t_ag_freeblks_delta += (int64_t)d) +#define xfs_trans_agflist_delta(tp, d) ((tp)->t_ag_flist_delta += (int64_t)d) +#define xfs_trans_agbtree_delta(tp, d) ((tp)->t_ag_btree_delta += (int64_t)d) #else #define xfs_trans_agblocks_delta(tp, d) #define xfs_trans_agflist_delta(tp, d) @@ -950,7 +950,7 @@ xfs_trans_t *_xfs_trans_alloc(struct xfs xfs_trans_t *xfs_trans_dup(xfs_trans_t *); int xfs_trans_reserve(xfs_trans_t *, uint, uint, uint, uint, uint); -void xfs_trans_mod_sb(xfs_trans_t *, uint, long); +void xfs_trans_mod_sb(xfs_trans_t *, uint, int64_t); struct xfs_buf *xfs_trans_get_buf(xfs_trans_t *, struct xfs_buftarg *, xfs_daddr_t, int, uint); int xfs_trans_read_buf(struct xfs_mount *, xfs_trans_t *, Index: 2.6.x-xfs-new/fs/xfs/xfs_bmap.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_bmap.c 2006-12-04 11:25:38.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_bmap.c 2006-12-04 11:33:31.478694275 +1100 @@ -684,7 +684,7 @@ xfs_bmap_add_extent( ASSERT(nblks <= da_old); if (nblks < da_old) xfs_mod_incore_sb(ip->i_mount, XFS_SBS_FDBLOCKS, - (int)(da_old - nblks), rsvd); + (int64_t)(da_old - nblks), rsvd); } /* * Clear out the allocated field, done with it now in any case. @@ -1209,7 +1209,7 @@ xfs_bmap_add_extent_delay_real( diff = (int)(temp + temp2 - STARTBLOCKVAL(PREV.br_startblock) - (cur ? cur->bc_private.b.allocated : 0)); if (diff > 0 && - xfs_mod_incore_sb(ip->i_mount, XFS_SBS_FDBLOCKS, -diff, rsvd)) { + xfs_mod_incore_sb(ip->i_mount, XFS_SBS_FDBLOCKS, -((int64_t)diff), rsvd)) { /* * Ick gross gag me with a spoon. */ @@ -1220,7 +1220,7 @@ xfs_bmap_add_extent_delay_real( diff--; if (!diff || !xfs_mod_incore_sb(ip->i_mount, - XFS_SBS_FDBLOCKS, -diff, rsvd)) + XFS_SBS_FDBLOCKS, -((int64_t)diff), rsvd)) break; } if (temp2) { @@ -1228,7 +1228,7 @@ xfs_bmap_add_extent_delay_real( diff--; if (!diff || !xfs_mod_incore_sb(ip->i_mount, - XFS_SBS_FDBLOCKS, -diff, rsvd)) + XFS_SBS_FDBLOCKS, -((int64_t)diff), rsvd)) break; } } @@ -2015,7 +2015,7 @@ xfs_bmap_add_extent_hole_delay( if (oldlen != newlen) { ASSERT(oldlen > newlen); xfs_mod_incore_sb(ip->i_mount, XFS_SBS_FDBLOCKS, - (int)(oldlen - newlen), rsvd); + (int64_t)(oldlen - newlen), rsvd); /* * Nothing to do for disk quota accounting here. */ @@ -3359,7 +3359,7 @@ xfs_bmap_del_extent( */ ASSERT(da_old >= da_new); if (da_old > da_new) - xfs_mod_incore_sb(mp, XFS_SBS_FDBLOCKS, (int)(da_old - da_new), + xfs_mod_incore_sb(mp, XFS_SBS_FDBLOCKS, (int64_t)(da_old - da_new), rsvd); if (delta) { /* DELTA: report the original extent. */ @@ -4929,28 +4929,28 @@ xfs_bmapi( if (rt) { error = xfs_mod_incore_sb(mp, XFS_SBS_FREXTENTS, - -(extsz), (flags & + -((int64_t)extsz), (flags & XFS_BMAPI_RSVBLOCKS)); } else { error = xfs_mod_incore_sb(mp, XFS_SBS_FDBLOCKS, - -(alen), (flags & + -((int64_t)alen), (flags & XFS_BMAPI_RSVBLOCKS)); } if (!error) { error = xfs_mod_incore_sb(mp, XFS_SBS_FDBLOCKS, - -(indlen), (flags & + -((int64_t)indlen), (flags & XFS_BMAPI_RSVBLOCKS)); if (error && rt) xfs_mod_incore_sb(mp, XFS_SBS_FREXTENTS, - extsz, (flags & + (int64_t)extsz, (flags & XFS_BMAPI_RSVBLOCKS)); else if (error) xfs_mod_incore_sb(mp, XFS_SBS_FDBLOCKS, - alen, (flags & + (int64_t)alen, (flags & XFS_BMAPI_RSVBLOCKS)); } @@ -5616,13 +5616,13 @@ xfs_bunmapi( rtexts = XFS_FSB_TO_B(mp, del.br_blockcount); do_div(rtexts, mp->m_sb.sb_rextsize); xfs_mod_incore_sb(mp, XFS_SBS_FREXTENTS, - (int)rtexts, rsvd); + (int64_t)rtexts, rsvd); (void)XFS_TRANS_RESERVE_QUOTA_NBLKS(mp, NULL, ip, -((long)del.br_blockcount), 0, XFS_QMOPT_RES_RTBLKS); } else { xfs_mod_incore_sb(mp, XFS_SBS_FDBLOCKS, - (int)del.br_blockcount, rsvd); + (int64_t)del.br_blockcount, rsvd); (void)XFS_TRANS_RESERVE_QUOTA_NBLKS(mp, NULL, ip, -((long)del.br_blockcount), 0, XFS_QMOPT_RES_REGBLKS); From owner-xfs@oss.sgi.com Sun Jan 7 22:11:47 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 07 Jan 2007 22:11:52 -0800 (PST) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l086Bhqw020794 for ; Sun, 7 Jan 2007 22:11:45 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA22344; Mon, 8 Jan 2007 17:10:43 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l086Ag7Y86453615; Mon, 8 Jan 2007 17:10:42 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id l086Afep85364320; Mon, 8 Jan 2007 17:10:41 +1100 (AEDT) Date: Mon, 8 Jan 2007 17:10:40 +1100 From: David Chinner To: xfs-dev@sgi.com Cc: xfs@oss.sgi.com Subject: Review: fix block reservation to work with per-cpu counters Message-ID: <20070108061040.GD44411608@melbourne.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-archive-position: 10189 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Content-Length: 6551 Lines: 206 Currently, XFS_IOC_SET_RESBLKS will not work properly when per-cpu superblock counters are enabled. Reservations can be lost silently as they are applied to the incore superblock instead of the currently active counters. Rather than try to shoe-horn the current reservation code into the per-cpu counters or vice-versa, we lock the superblock and snap the current counter state and work on that number. Once we work out exactly how much we need to "allocate" to the reserved area, we drop the lock and call xfs_mod_incore_sb() which will do all the right things w.r.t to the counter state. If we fail to get as much as we want (i.e. ENOSPC is returned) we go back to the start and try to allocate as much of what is left. Comments? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group --- fs/xfs/xfs_fsops.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++----- fs/xfs/xfs_mount.c | 16 ++------------- fs/xfs/xfs_mount.h | 2 - fs/xfs/xfs_vfsops.c | 2 - 4 files changed, 54 insertions(+), 20 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/xfs_fsops.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_fsops.c 2006-12-12 12:05:20.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_fsops.c 2006-12-22 00:30:53.770384646 +1100 @@ -460,7 +460,7 @@ xfs_fs_counts( { unsigned long s; - xfs_icsb_sync_counters_lazy(mp); + xfs_icsb_sync_counters_flags(mp, XFS_ICSB_LAZY_COUNT); s = XFS_SB_LOCK(mp); cnt->freedata = mp->m_sb.sb_fdblocks - XFS_ALLOC_SET_ASIDE(mp); cnt->freertx = mp->m_sb.sb_frextents; @@ -491,7 +491,7 @@ xfs_reserve_blocks( __uint64_t *inval, xfs_fsop_resblks_t *outval) { - __int64_t lcounter, delta; + __int64_t lcounter, delta, fdblks_delta; __uint64_t request; unsigned long s; @@ -504,17 +504,35 @@ xfs_reserve_blocks( } request = *inval; + + /* + * With per-cpu counters, this becomes an interesting + * problem. we needto work out if we are freeing or allocation + * blocks first, then we can do the modification as necessary. + * + * We do this under the XFS_SB_LOCK so that if we are near + * ENOSPC, we will hold out any changes while we work out + * what to do. This means that the amount of free space can + * change while we do this, so we need to retry if we end up + * trying to reserve more space than is available. + * + * We also use the xfs_mod_incore_sb() interface so that we + * don't have to care about whether per cpu counter are + * enabled, disabled or even compiled in.... + */ +retry: s = XFS_SB_LOCK(mp); + xfs_icsb_sync_counters_flags(mp, XFS_ICSB_SB_LOCKED); /* * If our previous reservation was larger than the current value, * then move any unused blocks back to the free pool. */ - + fdblks_delta = 0; if (mp->m_resblks > request) { lcounter = mp->m_resblks_avail - request; if (lcounter > 0) { /* release unused blocks */ - mp->m_sb.sb_fdblocks += lcounter; + fdblks_delta = lcounter; mp->m_resblks_avail -= lcounter; } mp->m_resblks = request; @@ -522,24 +540,50 @@ xfs_reserve_blocks( __int64_t free; free = mp->m_sb.sb_fdblocks - XFS_ALLOC_SET_ASIDE(mp); + if (!free) + goto out; /* ENOSPC and fdblks_delta = 0 */ + delta = request - mp->m_resblks; lcounter = free - delta; if (lcounter < 0) { /* We can't satisfy the request, just get what we can */ mp->m_resblks += free; mp->m_resblks_avail += free; + fdblks_delta = -free; mp->m_sb.sb_fdblocks = XFS_ALLOC_SET_ASIDE(mp); } else { + fdblks_delta = -delta; mp->m_sb.sb_fdblocks = lcounter + XFS_ALLOC_SET_ASIDE(mp); mp->m_resblks = request; mp->m_resblks_avail += delta; } } - +out: outval->resblks = mp->m_resblks; outval->resblks_avail = mp->m_resblks_avail; XFS_SB_UNLOCK(mp, s); + + if (fdblks_delta) { + /* + * If we are putting blocks back here, m_resblks_avail is + * already at it's max so this will put it in the free pool. + * + * If we need space, we'll either succeed in getting it + * from the free block count or we'll get an enospc. If + * we get a ENOSPC, it means things changed while we were + * calculating fdblks_delta and so we should try again to + * see if there is anything left to reserve. + * + * Don't set the reserved flag here - we don't want to reserve + * the extra reserve blocks from the reserve..... + */ + int error; + error = xfs_mod_incore_sb(mp, XFS_SBS_FDBLOCKS, fdblks_delta, 0); + if (error == ENOSPC) + goto retry; + } + return 0; } Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.c 2006-12-12 18:02:03.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_mount.c 2006-12-21 22:53:35.669131775 +1100 @@ -1963,8 +1963,8 @@ xfs_icsb_enable_counter( xfs_icsb_unlock_all_counters(mp); } -STATIC void -xfs_icsb_sync_counters_int( +void +xfs_icsb_sync_counters_flags( xfs_mount_t *mp, int flags) { @@ -1996,17 +1996,7 @@ STATIC void xfs_icsb_sync_counters( xfs_mount_t *mp) { - xfs_icsb_sync_counters_int(mp, 0); -} - -/* - * lazy addition used for things like df, background sb syncs, etc - */ -void -xfs_icsb_sync_counters_lazy( - xfs_mount_t *mp) -{ - xfs_icsb_sync_counters_int(mp, XFS_ICSB_LAZY_COUNT); + xfs_icsb_sync_counters_flags(mp, 0); } /* Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.h 2006-12-20 22:59:33.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_mount.h 2006-12-21 22:52:35.596932143 +1100 @@ -312,7 +312,7 @@ typedef struct xfs_icsb_cnts { #define XFS_ICSB_LAZY_COUNT (1 << 1) /* accuracy not needed */ extern int xfs_icsb_init_counters(struct xfs_mount *); -extern void xfs_icsb_sync_counters_lazy(struct xfs_mount *); +extern void xfs_icsb_sync_counters_flags(struct xfs_mount *, int); #else #define xfs_icsb_init_counters(mp) (0) Index: 2.6.x-xfs-new/fs/xfs/xfs_vfsops.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_vfsops.c 2006-12-12 15:40:58.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_vfsops.c 2006-12-22 00:28:42.851623181 +1100 @@ -815,7 +815,7 @@ xfs_statvfs( statp->f_type = XFS_SB_MAGIC; - xfs_icsb_sync_counters_lazy(mp); + xfs_icsb_sync_counters_flags(mp, XFS_ICSB_LAZY_COUNT); s = XFS_SB_LOCK(mp); statp->f_bsize = sbp->sb_blocksize; lsize = sbp->sb_logstart ? sbp->sb_logblocks : 0; From owner-xfs@oss.sgi.com Mon Jan 8 01:21:11 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 08 Jan 2007 01:21:16 -0800 (PST) Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l089LAqw025065 for ; Mon, 8 Jan 2007 01:21:11 -0800 Received: from hch by pentafluge.infradead.org with local (Exim 4.63 #1 (Red Hat Linux)) id 1H3qVI-0004XJ-7x; Mon, 08 Jan 2007 09:09:16 +0000 Date: Mon, 8 Jan 2007 09:09:16 +0000 From: Christoph Hellwig To: David Chinner Cc: xfs-dev@sgi.com, xfs@oss.sgi.com Subject: Re: Review: fix mapping invalidation callouts Message-ID: <20070108090916.GA17121@infradead.org> References: <20070108040309.GX33919298@melbourne.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070108040309.GX33919298@melbourne.sgi.com> User-Agent: Mutt/1.4.2.2i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-archive-position: 10191 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs Content-Length: 1111 Lines: 25 On Mon, Jan 08, 2007 at 03:03:09PM +1100, David Chinner wrote: > With the recent cancel_dirty_page() changes, a warning was > added if we cancel a dirty page that is still mapped into > the page tables. > This happens in XFS from fs_tosspages() and fs_flushinval_pages() > because they call truncate_inode_pages(). > > truncate_inode_pages() does not invalidate existing page mappings; > it is expected taht this is called only when truncating the file > or destroying the inode and on both these cases there can be > no mapped ptes. However, we call this when doing direct I/O writes > to remove pages from the page cache. As a result, we can rip > a page from the page cache that still has mappings attached. > > The correct fix is to use invalidate_inode_pages2_range() instead > of truncate_inode_pages(). They essentially do the same thing, but > the former also removes any pte mappings before removing the page > from the page cache. > > Comments? Generally looks good. But I feel a little cautios about changes in this area, so we should throw all possible test loads at this before commiting it. From owner-xfs@oss.sgi.com Mon Jan 8 01:21:09 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 08 Jan 2007 01:21:15 -0800 (PST) Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l089L8qw025051 for ; Mon, 8 Jan 2007 01:21:09 -0800 Received: from hch by pentafluge.infradead.org with local (Exim 4.63 #1 (Red Hat Linux)) id 1H3qYE-0004gm-G8; Mon, 08 Jan 2007 09:12:18 +0000 Date: Mon, 8 Jan 2007 09:12:18 +0000 From: Christoph Hellwig To: David Chinner Cc: xfs-dev@sgi.com, xfs@oss.sgi.com Subject: Re: Review: make growing by >2TB work Message-ID: <20070108091218.GB17121@infradead.org> References: <20070108044414.GC44411608@melbourne.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070108044414.GC44411608@melbourne.sgi.com> User-Agent: Mutt/1.4.2.2i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-archive-position: 10190 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs Content-Length: 808 Lines: 27 On Mon, Jan 08, 2007 at 03:44:14PM +1100, David Chinner wrote: > Growing a filesystem by > 2TB currently causes an overflow > in the transaction subsystem. Make transaction deltas and associated > elements explicitly 64 bit types so that we don't get overflows. > > Comments? Looks good. > > - AIL_LOCKINIT(&mp->m_ail_lock, "xfs_ail"); > spinlock_init(&mp->m_sb_lock, "xfs_sb"); > mutex_init(&mp->m_ilock); > initnsema(&mp->m_growlock, 1, "xfs_grow"); > - /* > - * Initialize the AIL. > - */ > - xfs_trans_ail_init(mp); This seems unrelated (?) > -xfs_mod_incore_sb(xfs_mount_t *mp, xfs_sb_field_t field, int delta, int rsvd) > +xfs_mod_incore_sb(xfs_mount_t *mp, xfs_sb_field_t field, int64_t delta, int rsvd) This seems to be over 80 chars linelength with your patch, just break the line. From owner-xfs@oss.sgi.com Mon Jan 8 02:45:56 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 08 Jan 2007 02:46:00 -0800 (PST) Received: from gw02.mail.saunalahti.fi (gw02.mail.saunalahti.fi [195.197.172.116]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l08Ajsqw006620 for ; Mon, 8 Jan 2007 02:45:56 -0800 Received: from mrp2.mail.saunalahti.fi (mrp2.mail.saunalahti.fi [62.142.5.31]) by gw02.mail.saunalahti.fi (Postfix) with ESMTP id 0EDFE13978A for ; Mon, 8 Jan 2007 12:23:34 +0200 (EET) Received: from [192.168.0.151] (unknown [62.142.247.178]) (using SSLv3 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by mrp2.mail.saunalahti.fi (Postfix) with ESMTP id E1752598004 for ; Mon, 8 Jan 2007 12:23:32 +0200 (EET) Subject: xfs_repair: corrupt inode error From: Jyrki Muukkonen To: xfs@oss.sgi.com Content-Type: text/plain Date: Mon, 08 Jan 2007 12:23:32 +0200 Message-Id: <1168251812.20568.8.camel@mustis> Mime-Version: 1.0 X-Mailer: Evolution 2.8.1 Content-Transfer-Encoding: 7bit X-archive-position: 10192 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jyrki.muukkonen@futurice.fi Precedence: bulk X-list: xfs Content-Length: 871 Lines: 25 Got this error in phase 6 when running xfs_repair 2.8.18 on ~1.2TB partition over the weekend (it took around 60 hours to get to this point :). On earlier versions xfs_repair aborted after ~15-20 hours with "invalid inode type" error. ... disconnected inode 4151889519, moving to lost+found disconnected inode 4151889543, moving to lost+found corrupt inode 4151889543 (btree). This is a bug. Please report it to xfs@oss.sgi.com. cache_node_purge: refcount was 1, not zero (node=0x132650d0) fatal error -- 117 - couldn't iget disconnected inode I've got the full log (both stderr and stdout) and can put that somewhere if needed. It's about 80MB uncompressed and around 7MB gzipped. Running the xfs_repair without multithreading and with -v might also be possible if that's going to help. -- Jyrki Muukkonen Futurice Oy jyrki.muukkonen@futurice.fi +358 41 501 7322 From owner-xfs@oss.sgi.com Mon Jan 8 02:52:57 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 08 Jan 2007 02:53:02 -0800 (PST) Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l08Aquqw008011 for ; Mon, 8 Jan 2007 02:52:57 -0800 Received: from hch by pentafluge.infradead.org with local (Exim 4.63 #1 (Red Hat Linux)) id 1H3s6l-00088J-FE; Mon, 08 Jan 2007 10:52:03 +0000 Date: Mon, 8 Jan 2007 10:52:03 +0000 From: Christoph Hellwig To: David Chinner Cc: xfs-dev@sgi.com, xfs@oss.sgi.com Subject: Re: Review: fix block reservation to work with per-cpu counters Message-ID: <20070108105203.GA31252@infradead.org> References: <20070108061040.GD44411608@melbourne.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070108061040.GD44411608@melbourne.sgi.com> User-Agent: Mutt/1.4.2.2i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-archive-position: 10193 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs Content-Length: 929 Lines: 22 On Mon, Jan 08, 2007 at 05:10:40PM +1100, David Chinner wrote: > Currently, XFS_IOC_SET_RESBLKS will not work properly when > per-cpu superblock counters are enabled. Reservations can be lost > silently as they are applied to the incore superblock instead of > the currently active counters. > > Rather than try to shoe-horn the current reservation code into > the per-cpu counters or vice-versa, we lock the superblock > and snap the current counter state and work on that number. > Once we work out exactly how much we need to "allocate" to > the reserved area, we drop the lock and call xfs_mod_incore_sb() > which will do all the right things w.r.t to the counter state. > > If we fail to get as much as we want (i.e. ENOSPC is returned) > we go back to the start and try to allocate as much of what is > left. > > Comments? Sounds okay. Reservations shouldn't be frequent enough for this to have a performance impact. From owner-xfs@oss.sgi.com Mon Jan 8 04:14:09 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 08 Jan 2007 04:14:14 -0800 (PST) Received: from pne-smtpout4-sn1.fre.skanova.net (pne-smtpout4-sn1.fre.skanova.net [81.228.11.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l08CE5qw021469 for ; Mon, 8 Jan 2007 04:14:08 -0800 Received: from safari.iki.fi (80.223.106.128) by pne-smtpout4-sn1.fre.skanova.net (7.2.075) id 44A36A0A008DA590 for xfs@oss.sgi.com; Mon, 8 Jan 2007 12:03:24 +0100 Received: (qmail 10778 invoked by uid 500); 8 Jan 2007 11:03:23 -0000 Date: Mon, 8 Jan 2007 13:03:23 +0200 From: Sami Farin To: linux-kernel Mailing List , xfs@oss.sgi.com Subject: Re: xfs_file_ioctl / xfs_freeze: BUG: warning at kernel/mutex-debug.c:80/debug_mutex_unlock() Message-ID: <20070108110323.GA3803@m.safari.iki.fi> Mail-Followup-To: linux-kernel Mailing List , xfs@oss.sgi.com References: <20070104001420.GA32440@m.safari.iki.fi> <20070107213734.GS44411608@melbourne.sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070107213734.GS44411608@melbourne.sgi.com> User-Agent: Mutt/1.5.13 (2006-08-11) X-archive-position: 10194 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: safari-xfs@safari.iki.fi Precedence: bulk X-list: xfs Content-Length: 869 Lines: 29 On Mon, Jan 08, 2007 at 08:37:34 +1100, David Chinner wrote: ... > > fstab was there just fine after -u. > > Oh, that still hasn't been fixed? Looked like it =) > Generic bug, not XFS - the global > semaphore->mutex cleanup converted the bd_mount_sem to a mutex, and > mutexes complain loudly when a the process unlocking the mutex is > not the process that locked it. > > Basically, the generic code is broken - the bd_mount_mutex needs to > be reverted back to a semaphore because it is locked and unlocked > by different processes. The following patch does this.... > > BTW, Sami, can you cc xfs@oss.sgi.com on XFS bug reports in future; > you'll get more XFS savvy eyes there..... Forgot to. Thanks for patch. It fixed the issue, no more warnings. BTW. the fix is not in 2.6.git, either. -- Do what you love because life is too short for anything else. From owner-xfs@oss.sgi.com Mon Jan 8 05:08:46 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 08 Jan 2007 05:08:53 -0800 (PST) Received: from web31704.mail.mud.yahoo.com (web31704.mail.mud.yahoo.com [68.142.201.184]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l08D8iqw003127 for ; Mon, 8 Jan 2007 05:08:46 -0800 Received: (qmail 79111 invoked by uid 60001); 8 Jan 2007 12:41:07 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:Date:From:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding:Message-ID; b=VLrCwkPST9v1+AfD3UCSbDzdDULB1OtBKId/alYx4gebrPsK4xhbvmUpRlUIUoq8tf6Sizyz0XLKrU7tw8F9bC+vbaUwsCsWwypmOAknlSw2IfeNO6F6R/jBh3RGytim5cu6h9DZ395dwbxt/mCxJmeLIV/XgYL02Gn/cTemOd0=; X-YMail-OSG: jyt.HCQVM1m9RPrZndjctcoQhySkIG.9.rx7tWMRYK3a.pSSRDmJRiGodRqnHeJsyDfdtqt1PGcoi0LkQQUbyedXiJDX80ndWG2KHNLICzkcdLof9hxOVh7WREbBD21aufMhsCvrNE9grjk- Received: from [212.150.66.71] by web31704.mail.mud.yahoo.com via HTTP; Mon, 08 Jan 2007 04:41:07 PST Date: Mon, 8 Jan 2007 04:41:07 -0800 (PST) From: Heilige Gheist Subject: kmem_alloc deadlock in SLES9 SP3 To: linux-xfs@oss.sgi.com MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Message-ID: <674311.78396.qm@web31704.mail.mud.yahoo.com> X-archive-position: 10195 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hgheist@yahoo.com Precedence: bulk X-list: xfs Content-Length: 835 Lines: 25 I'm getting occassional system freezes preceded by spurious kmem_deadlock messages. The system is running SLES9 SP3, xfs with large (~1GB) fragmented files, using real-time section. The message is Jan 8 06:27:55 ce-9 kernel: XFS: possible memory allocation deadlock in kmem_alloc (mode:0x2d0) ce-9:~ # uname -a Linux ce-9 2.6.5-7.276-bigsmp #2 SMP Tue Sep 19 05:27:23 IDT 2006 i686 i686 i386 GNU/Linux The similar bug report http://oss.sgi.com/bugzilla/show_bug.cgi?id=410 recommends upgrading to 2.6.17 to make use of new incore extent management code. Is there a version of commercial Linux (RHEL/SLES) that incorporates this fix? SLES10 is based on 2.6.16 kernel. --alan __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From owner-xfs@oss.sgi.com Mon Jan 8 05:08:49 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 08 Jan 2007 05:08:57 -0800 (PST) Received: from pne-smtpout4-sn2.hy.skanova.net (pne-smtpout4-sn2.hy.skanova.net [81.228.8.154]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l08D8jqw003133 for ; Mon, 8 Jan 2007 05:08:49 -0800 Received: from safari.iki.fi (80.223.106.128) by pne-smtpout4-sn2.hy.skanova.net (7.2.075) id 44A2EAB8008C1173 for xfs@oss.sgi.com; Mon, 8 Jan 2007 12:58:18 +0100 Received: (qmail 12273 invoked by uid 500); 8 Jan 2007 11:58:17 -0000 Date: Mon, 8 Jan 2007 13:58:17 +0200 From: Sami Farin To: xfs@oss.sgi.com, linux-kernel@vger.kernel.org Cc: Andrew Morton , David Chinner , Hugh Dickins , Nick Piggin Subject: Re: BUG: warning at mm/truncate.c:60/cancel_dirty_page() Message-ID: <20070108115816.GB3803@m.safari.iki.fi> Mail-Followup-To: xfs@oss.sgi.com, linux-kernel@vger.kernel.org, Andrew Morton , David Chinner , Hugh Dickins , Nick Piggin References: <20070106023907.GA7766@m.safari.iki.fi> <20070107222341.GT33919298@melbourne.sgi.com> <20070107144812.96357ff9.akpm@osdl.org> <20070107230436.GU33919298@melbourne.sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070107230436.GU33919298@melbourne.sgi.com> User-Agent: Mutt/1.5.13 (2006-08-11) X-archive-position: 10196 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: safari-xfs@safari.iki.fi Precedence: bulk X-list: xfs Content-Length: 1002 Lines: 28 On Mon, Jan 08, 2007 at 10:04:36 +1100, David Chinner wrote: > On Sun, Jan 07, 2007 at 02:48:12PM -0800, Andrew Morton wrote: > > On Mon, 8 Jan 2007 09:23:41 +1100 > > David Chinner wrote: > > > > > How are you supposed to invalidate a range of pages in a mapping for > > > this case, then? invalidate_mapping_pages() would appear to be the > > > candidate (the generic code uses this), but it _skips_ pages that > > > are already mapped. > > > > unmap_mapping_range()? > > /me looks at how it's used in invalidate_inode_pages2_range() and > decides it's easier not to call this directly. > > > > So, am I correct in assuming we should be calling invalidate_inode_pages2_range() > > > instead of truncate_inode_pages()? > > > > That would be conventional. > > .... in that case the following patch should fix the warning: I tried dt+strace+cinfo with this patch applied and got no warnings. Thanks for quick fix. -- Do what you love because life is too short for anything else. From owner-xfs@oss.sgi.com Mon Jan 8 05:40:55 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 08 Jan 2007 05:40:59 -0800 (PST) Received: from web59111.mail.re1.yahoo.com (web59111.mail.re1.yahoo.com [66.196.101.22]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l08Deqqw009792 for ; Mon, 8 Jan 2007 05:40:54 -0800 Received: (qmail 57541 invoked by uid 60001); 8 Jan 2007 13:13:13 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:Date:From:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding:Message-ID; b=PEPdSTFHJLkQuWr5Bf+GaX+C5qw7z/cJl2tAjkD5fE8hCMhxNMmcxZP0Ie9uruNvlsDSTlkY4o1pLhkqk+TvpfT8HMi+amT4JnQOXusDvGMZnpYYqlrez2QdgkPIpXrO1R4UIu1ZbWUkPCf7Pr90SL46n7lMAe2jdnlWVjF0qG0=; X-YMail-OSG: gH7XkVsVM1lGAJQUO6aYVyqlEWke5ZYcOIwttweIcwf9ltYhEW8z6Iz1df84fPB9px_FMdp0vQtZdR3O29S_WyQKt5VtSJ00wn6qBKBt5453EvvGYHYQMqdGOeo6ft06fXQZGgyJBtPC2.wxBVKNMvdty8QPPI2HsNywWhPtP25_ Received: from [213.132.154.79] by web59111.mail.re1.yahoo.com via HTTP; Mon, 08 Jan 2007 05:13:12 PST Date: Mon, 8 Jan 2007 05:13:12 -0800 (PST) From: Dave N Subject: What's wrong with XFS? To: xfs@oss.sgi.com MIME-Version: 1.0 Message-ID: <936386.57179.qm@web59111.mail.re1.yahoo.com> Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: 7bit X-archive-position: 10198 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: mutex1@yahoo.com Precedence: bulk X-list: xfs Content-Length: 2408 Lines: 20 Hi, Can someone enlighten me what the issue is with XFS? I've been hearing a lot of good things on the Net about XFS. How it's lightening fast, how it has features other file systems do not have (like GRIO, real time volumes, allocate on flush, etc), how it scales very well, etc... but what I didn't hear about is how fast XFS screws things up if something wrong happens. Because of the good things I heard about XFS, I too decided to try it out (been using Ext3 or ReiserFS here for most of the time). Now I'm very disappointed in XFS. I live in an area where power outages are common and I do not have an UPS here. I have a few computers all running on XFS and thought that XFS will give me similar data-integrity like Ext3 or ReiserFS. Now, for the past few weeks I've been experiencing "strange behavior" from XFS. One time, I was reading an article on the Net and had only my Firefox browser open. Then we had a power outage for a short period of time, and when I logged in again into KDE, I was surprised to find out that all my desktop icons were messed up all over the place. The other time, again power outage, only this time I was working on a small text file. Booted up again only to find out that the file I was working on contained garbage and I had to start all over again. I also heard that XFS depends heavily on the application side for its data-integrity. XFS "thinks" that the application will use the proper calls when writing to disk. What???? How is it the task of the application to ensure the safety of your files??? IMO, programs are there to provide the tools to be productive, NOT to ensure the data safety of your files, that's the task of the file system. Even MySQL provides me with better data-integrity here. If I'm doing some database transaction and the power fails, I can be pretty sure that *most* of the time, MySQL will be just fine next time I boot up. Why oh why such a beautiful file system like XFS is so terrible at data-integrity? Look what Sun Microsystems did with their new ZFS file system... full atomicity, CRC checksumming and other features to ensure data-integrity... why can't XFS have such things? Thanks for listening to my preaching here guys Cheers! __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com [[HTML alternate version deleted]] From owner-xfs@oss.sgi.com Mon Jan 8 06:46:55 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 08 Jan 2007 06:47:03 -0800 (PST) Received: from smtp113.sbc.mail.mud.yahoo.com (smtp113.sbc.mail.mud.yahoo.com [68.142.198.212]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l08Eksqw022022 for ; Mon, 8 Jan 2007 06:46:55 -0800 Received: (qmail 71865 invoked from network); 8 Jan 2007 14:46:01 -0000 Received: from unknown (HELO stupidest.org) (cwedgwood@sbcglobal.net@24.5.75.45 with login) by smtp113.sbc.mail.mud.yahoo.com with SMTP; 8 Jan 2007 14:46:01 -0000 X-YMail-OSG: yCHEEAoVM1nr1jeANQdaVdHJaA5a7Isfj4K_s1adPn_f.p85cuQDjXyxjiGIjWWkCboecBArvkIeQVjguqxS6.Zltop6E1due9XHtVVEzN9USZaoLc8.J52CUCnRb1Zr57.FO_DrRmV.zz6wNGwnF0lvPGMkrlTMvIAMWB3mu8SMcD0l5V6ZrpTOLlUG Received: by tuatara.stupidest.org (Postfix, from userid 10000) id 8FFE6182614B; Mon, 8 Jan 2007 06:45:49 -0800 (PST) Date: Mon, 8 Jan 2007 06:45:49 -0800 From: Chris Wedgwood To: Dave N Cc: xfs@oss.sgi.com Subject: Re: What's wrong with XFS? Message-ID: <20070108144549.GA12073@tuatara.stupidest.org> References: <936386.57179.qm@web59111.mail.re1.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <936386.57179.qm@web59111.mail.re1.yahoo.com> X-archive-position: 10199 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: cw@f00f.org Precedence: bulk X-list: xfs Content-Length: 2415 Lines: 60 On Mon, Jan 08, 2007 at 05:13:12AM -0800, Dave N wrote: > KDE, I was surprised to find out that all my desktop icons were > messed up all over the place. KDE made assumptions which are not only not true on linux but not true elsewhere either. Last I checked KDE dealt with the common cases that were problematic much better now. > The other time, again power outage, only this time I was working on > a small text file. Booted up again only to find out that the file I > was working on contained garbage and I had to start all over again. The file should not have contained garbage. Also, if you open+truncate+write a file it should be flushed very soon after close these days, the window is fairly small now. > I also heard that XFS depends heavily on the application side for > its data-integrity. XFS "thinks" that the application will use the > proper calls when writing to disk. What???? How is it the task of > the application to ensure the safety of your files??? It's always been that way, for many many years, even before Linux existed. If you want your applictions to be portable and reliable then you have to do do it right. MTAs are a good example of applications which typically get this right because people case about lost email and the authors typically take some effort into make sure it's right. > IMO, programs are there to provide the tools to be productive, NOT > to ensure the data safety of your files, that's the task of the file > system. Even MySQL provides me with better data-integrity here. Does MySQL allow me to read or write 100s of MB/s continuously on cheap hardware (for not so cheap hardware I could ask 7GB/s). > Why oh why such a beautiful file system like XFS is so terrible at > data-integrity? There is a cost to full data journalling. Personally even with ext3 I find the impact of this high enough I don't use it. > Look what Sun Microsystems did with their new ZFS file > system... full atomicity, CRC checksumming and other features to > ensure data-integrity... You could argue XFS is showing it's age, it's far from a new filesystem these days. ZFS is a very different animal to most traditional filesystems. > why can't XFS have such things? Because the realities of life sometime collide with what people want ideally. Linux can't have ZFS for licensing reasons but you can have Solaris with ZFS: http://opensolaris.org/os/downloads/on/ From owner-xfs@oss.sgi.com Mon Jan 8 06:54:45 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 08 Jan 2007 06:54:51 -0800 (PST) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.187]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l08Eshqw023908 for ; Mon, 8 Jan 2007 06:54:44 -0800 Received: from [194.173.12.131] (helo=[172.25.16.7]) by mrelayeu.kundenserver.de (node=mrelayeu0) with ESMTP (Nemesis), id 0MKwh2-1H3vgP2b4x-00070B; Mon, 08 Jan 2007 15:41:06 +0100 Message-ID: <45A25800.6060603@gmx.net> Date: Mon, 08 Jan 2007 15:41:04 +0100 From: Klaus Strebel User-Agent: Thunderbird 1.5.0.9 (Windows/20061207) MIME-Version: 1.0 To: Dave N CC: xfs@oss.sgi.com Subject: Re: What's wrong with XFS? References: <936386.57179.qm@web59111.mail.re1.yahoo.com> In-Reply-To: <936386.57179.qm@web59111.mail.re1.yahoo.com> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 8bit X-Provags-ID: kundenserver.de abuse@kundenserver.de login:8a7df7300d3d15a4f701302fdde7adf9 X-archive-position: 10200 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: klaus.strebel@gmx.net Precedence: bulk X-list: xfs Content-Length: 1173 Lines: 30 Dave N schrieb: > Hi, > > Even MySQL provides me with better data-integrity here. If I'm doing some database transaction and the power fails, I can be pretty sure that *most* of the time, MySQL will be just fine next time I boot up. Hallo Dave, MySQL is an application which takes care of data-integrity ( which XFS depends on, as you stated yourself ;-) ). XFS takes care of the filesystem-integrity, to enable your MySQL to find the files it's caring of it's content-integrity ( as an application, you see ;-) ) > > Why oh why such a beautiful file system like XFS is so terrible at data-integrity? Look what Sun Microsystems did with their new ZFS file system... full atomicity, CRC checksumming and other features to ensure data-integrity... why can't XFS have such things? To mount multi-gigabyte filesystems after some kind of desaster in minutes, not in hours or days ;-). It's only caring for meta-data, not the data. > > Thanks for listening to my preaching here guys > > Cheers! -- Mit freundlichen Grüssen / best regards Klaus Strebel, Dipl.-Inform. (FH), mailto:klaus.strebel@gmx.net /"\ \ / ASCII RIBBON CAMPAIGN X AGAINST HTML MAIL / \ From owner-xfs@oss.sgi.com Mon Jan 8 07:13:16 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 08 Jan 2007 07:13:19 -0800 (PST) Received: from extgat1.local.navi.pl (ip-83-238-212-180.netia.com.pl [83.238.212.180]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l08FDDqw027209 for ; Mon, 8 Jan 2007 07:13:15 -0800 Received: from venus.local.navi.pl (www.local.navi.pl [192.168.1.10]) by extgat1.local.navi.pl (8.13.1/8.13.1) with ESMTP id l08EobEL003960 for ; Mon, 8 Jan 2007 15:50:37 +0100 Received: from venus.local.navi.pl (venus.local.navi.pl [192.168.1.10]) by venus.local.navi.pl (8.13.1/8.13.1) with ESMTP id l08Eobxm032566 for ; Mon, 8 Jan 2007 15:50:37 +0100 Subject: Re: What's wrong with XFS? From: Olaf =?iso-8859-2?Q?Fr=B1czyk?= To: xfs@oss.sgi.com In-Reply-To: <936386.57179.qm@web59111.mail.re1.yahoo.com> References: <936386.57179.qm@web59111.mail.re1.yahoo.com> Content-Type: text/plain; charset=UTF-8 Date: Mon, 08 Jan 2007 15:50:37 +0100 Message-Id: <1168267837.29690.14.camel@venus.local.navi.pl> Mime-Version: 1.0 X-Mailer: Evolution 2.0.2 (2.0.2-3) Content-Transfer-Encoding: 8bit X-archive-position: 10201 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: olaf@cbk.poznan.pl Precedence: bulk X-list: xfs Content-Length: 2829 Lines: 35 On Mon, 2007-01-08 at 05:13 -0800, Dave N wrote: > Hi, > > Can someone enlighten me what the issue is with XFS? I've been hearing a lot of good things on the Net about XFS. How it's lightening fast, how it has features other file systems do not have (like GRIO, real time volumes, allocate on flush, etc), how it scales very well, etc... but what I didn't hear about is how fast XFS screws things up if something wrong happens. Because of the good things I heard about XFS, I too decided to try it out (been using Ext3 or ReiserFS here for most of the time). Now I'm very disappointed in XFS. I live in an area where power outages are common and I do not have an UPS here. I have a few computers all running on XFS and thought that XFS will give me similar data-integrity like Ext3 or ReiserFS. Now, for the past few weeks I've been experiencing "strange behavior" from XFS. One time, I was reading an article on the Net and had only my Firefox browser open. Then we had a power outage for a short period of time, and when I logged in again into > KDE, I was surprised to find out that all my desktop icons were messed up all over the place. The other time, again power outage, only this time I was working on a small text file. Booted up again only to find out that the file I was working on contained garbage and I had to start all over again. > > I also heard that XFS depends heavily on the application side for its data-integrity. XFS "thinks" that the application will use the proper calls when writing to disk. What???? How is it the task of the application to ensure the safety of your files??? IMO, programs are there to provide the tools to be productive, NOT to ensure the data safety of your files, that's the task of the file system. Even MySQL provides me with better data-integrity here. If I'm doing some database transaction and the power fails, I can be pretty sure that *most* of the time, MySQL will be just fine next time I boot up. > > Why oh why such a beautiful file system like XFS is so terrible at data-integrity? Look what Sun Microsystems did with their new ZFS file system... full atomicity, CRC checksumming and other features to ensure data-integrity... why can't XFS have such things? > > Thanks for listening to my preaching here guys > > Cheers! Hi, It is nothing wrong with XFS - your expectations are wrong. You expect data to be journaled, but XFS does journal metadata only, not data. So, the thing that you get is filesystem integrity not data integrity. If you want data integrity you need properly written applications and __it is__ application's job to care about it's data. It is nothing unusual here. If you need data journaling then you need another filesystem - eg. ext3. I suppose that you find all of it in FAQ. Regards, Olaf -- Olaf FrÄ…czyk From owner-xfs@oss.sgi.com Mon Jan 8 07:13:17 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 08 Jan 2007 07:13:21 -0800 (PST) Received: from extgat1.local.navi.pl (ip-83-238-212-180.netia.com.pl [83.238.212.180]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l08FDDr0027209 for ; Mon, 8 Jan 2007 07:13:16 -0800 Received: from venus.local.navi.pl (venus.local.navi.pl [192.168.1.10]) by extgat1.local.navi.pl (8.13.1/8.13.1) with ESMTP id l08EnFLv003957; Mon, 8 Jan 2007 15:49:16 +0100 Received: from venus.local.navi.pl (venus.local.navi.pl [192.168.1.10]) by venus.local.navi.pl (8.13.1/8.13.1) with ESMTP id l08EnFc0032437; Mon, 8 Jan 2007 15:49:15 +0100 Subject: Re: What's wrong with XFS? From: Olaf Fraczyk To: Dave N Cc: xfs@oss.sgi.com In-Reply-To: <936386.57179.qm@web59111.mail.re1.yahoo.com> References: <936386.57179.qm@web59111.mail.re1.yahoo.com> Content-Type: text/plain Organization: NAVI Date: Mon, 08 Jan 2007 15:49:15 +0100 Message-Id: <1168267755.29690.13.camel@venus.local.navi.pl> Mime-Version: 1.0 X-Mailer: Evolution 2.0.2 (2.0.2-3) Content-Transfer-Encoding: 7bit X-archive-position: 10202 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: olaf@navi.pl Precedence: bulk X-list: xfs Content-Length: 2826 Lines: 35 On Mon, 2007-01-08 at 05:13 -0800, Dave N wrote: > Hi, > > Can someone enlighten me what the issue is with XFS? I've been hearing a lot of good things on the Net about XFS. How it's lightening fast, how it has features other file systems do not have (like GRIO, real time volumes, allocate on flush, etc), how it scales very well, etc... but what I didn't hear about is how fast XFS screws things up if something wrong happens. Because of the good things I heard about XFS, I too decided to try it out (been using Ext3 or ReiserFS here for most of the time). Now I'm very disappointed in XFS. I live in an area where power outages are common and I do not have an UPS here. I have a few computers all running on XFS and thought that XFS will give me similar data-integrity like Ext3 or ReiserFS. Now, for the past few weeks I've been experiencing "strange behavior" from XFS. One time, I was reading an article on the Net and had only my Firefox browser open. Then we had a power outage for a short period of time, and when I logged in again into > KDE, I was surprised to find out that all my desktop icons were messed up all over the place. The other time, again power outage, only this time I was working on a small text file. Booted up again only to find out that the file I was working on contained garbage and I had to start all over again. > > I also heard that XFS depends heavily on the application side for its data-integrity. XFS "thinks" that the application will use the proper calls when writing to disk. What???? How is it the task of the application to ensure the safety of your files??? IMO, programs are there to provide the tools to be productive, NOT to ensure the data safety of your files, that's the task of the file system. Even MySQL provides me with better data-integrity here. If I'm doing some database transaction and the power fails, I can be pretty sure that *most* of the time, MySQL will be just fine next time I boot up. > > Why oh why such a beautiful file system like XFS is so terrible at data-integrity? Look what Sun Microsystems did with their new ZFS file system... full atomicity, CRC checksumming and other features to ensure data-integrity... why can't XFS have such things? > > Thanks for listening to my preaching here guys > > Cheers! Hi, It is nothing wrong with XFS - your expectations are wrong. You expect data to be journaled, but XFS does journal metadata only, not data. So, the thing that you get is filesystem integrity not data integrity. If you want data integrity you need properly written applications and __it is__ application's job to care about it's data. It is nothing unusual here. If you need data journaling then you need another filesystem - eg. ext3. I suppose that you find all of it in FAQ. Regards, Olaf -- Olaf Fraczyk NAVI From owner-xfs@oss.sgi.com Mon Jan 8 07:25:19 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 08 Jan 2007 07:25:24 -0800 (PST) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.171]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l08FPHqw029952 for ; Mon, 8 Jan 2007 07:25:18 -0800 Received: from [194.173.12.131] (helo=[172.25.16.7]) by mrelayeu.kundenserver.de (node=mrelayeu6) with ESMTP (Nemesis), id 0ML29c-1H3wMK0doj-0007pa; Mon, 08 Jan 2007 16:24:24 +0100 Message-ID: <45A26227.2080907@gmx.net> Date: Mon, 08 Jan 2007 16:24:23 +0100 From: Klaus Strebel User-Agent: Thunderbird 1.5.0.9 (Windows/20061207) MIME-Version: 1.0 To: Dave N CC: xfs@oss.sgi.com Subject: Re: What's wrong with XFS? References: <936386.57179.qm@web59111.mail.re1.yahoo.com> <20070108144549.GA12073@tuatara.stupidest.org> In-Reply-To: <20070108144549.GA12073@tuatara.stupidest.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Provags-ID: kundenserver.de abuse@kundenserver.de login:8a7df7300d3d15a4f701302fdde7adf9 X-archive-position: 10203 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: klaus.strebel@gmx.net Precedence: bulk X-list: xfs Content-Length: 2805 Lines: 80 Chris Wedgwood schrieb: > On Mon, Jan 08, 2007 at 05:13:12AM -0800, Dave N wrote: > >> KDE, I was surprised to find out that all my desktop icons were >> messed up all over the place. > > KDE made assumptions which are not only not true on linux but not true > elsewhere either. Last I checked KDE dealt with the common cases that > were problematic much better now. > >> The other time, again power outage, only this time I was working on >> a small text file. Booted up again only to find out that the file I >> was working on contained garbage and I had to start all over again. > > The file should not have contained garbage. Also, if you > open+truncate+write a file it should be flushed very soon after close > these days, the window is fairly small now. > >> I also heard that XFS depends heavily on the application side for >> its data-integrity. XFS "thinks" that the application will use the >> proper calls when writing to disk. What???? How is it the task of >> the application to ensure the safety of your files??? > > It's always been that way, for many many years, even before Linux > existed. If you want your applictions to be portable and reliable > then you have to do do it right. > > MTAs are a good example of applications which typically get this right > because people case about lost email and the authors typically take > some effort into make sure it's right. > >> IMO, programs are there to provide the tools to be productive, NOT >> to ensure the data safety of your files, that's the task of the file >> system. Even MySQL provides me with better data-integrity here. > > Does MySQL allow me to read or write 100s of MB/s continuously on > cheap hardware (for not so cheap hardware I could ask 7GB/s). > >> Why oh why such a beautiful file system like XFS is so terrible at >> data-integrity? > > There is a cost to full data journalling. Personally even with ext3 I > find the impact of this high enough I don't use it. > >> Look what Sun Microsystems did with their new ZFS file >> system... full atomicity, CRC checksumming and other features to >> ensure data-integrity... > > You could argue XFS is showing it's age, it's far from a new > filesystem these days. > > ZFS is a very different animal to most traditional filesystems. > >> why can't XFS have such things? > > Because the realities of life sometime collide with what people want > ideally. > > Linux can't have ZFS for licensing reasons but you can have Solaris > with ZFS: http://opensolaris.org/os/downloads/on/ > > FYI, just found this ;-) Klaus -- Mit freundlichen Grüssen / best regards Klaus Strebel, Dipl.-Inform. (FH), mailto:klaus.strebel@gmx.net /"\ \ / ASCII RIBBON CAMPAIGN X AGAINST HTML MAIL / \ From owner-xfs@oss.sgi.com Mon Jan 8 07:59:44 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 08 Jan 2007 07:59:48 -0800 (PST) Received: from sumo.dreamhost.com (sumo.dreamhost.com [66.33.216.29]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l08Fxgqw004776 for ; Mon, 8 Jan 2007 07:59:43 -0800 Received: from spaceymail-a1.dreamhost.com (sd-green-bigip-62.dreamhost.com [208.97.132.62]) by sumo.dreamhost.com (Postfix) with ESMTP id 1584C17E75A for ; Mon, 8 Jan 2007 07:35:38 -0800 (PST) Received: from jupiter.solar.net (cpe-24-27-90-21.houston.res.rr.com [24.27.90.21]) by spaceymail-a1.dreamhost.com (Postfix) with ESMTP id 308AC194F6C for ; Mon, 8 Jan 2007 07:35:35 -0800 (PST) From: Joe Bacom Reply-To: joe@docsimple.com To: xfs@oss.sgi.com Subject: Re: What's wrong with XFS? Date: Mon, 8 Jan 2007 09:35:36 -0600 User-Agent: KMail/1.9.1 References: <936386.57179.qm@web59111.mail.re1.yahoo.com> <1168267755.29690.13.camel@venus.local.navi.pl> In-Reply-To: <1168267755.29690.13.camel@venus.local.navi.pl> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart1755204.IqRS27UC8K"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200701080935.36736.joe@docsimple.com> X-archive-position: 10204 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: joe@docsimple.com Precedence: bulk X-list: xfs Content-Length: 4237 Lines: 107 --nextPart1755204.IqRS27UC8K Content-Type: text/plain; charset="iso-8859-6" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline The solution to Dave's problem seems obvious to me. If you care about your= =20 data and your hardware, buy a UPS with power conditioning, configure Linux = to=20 show down when the battery gets low and enjoy the peace of mind knowing tha= t=20 even if your away from your machine and the power goes off, the system will= =20 take care of itself. Joe On Monday 08 January 2007 08:49, you wrote: > On Mon, 2007-01-08 at 05:13 -0800, Dave N wrote: > > Hi, > > > > Can someone enlighten me what the issue is with XFS? I've been hearing a > > lot of good things on the Net about XFS. How it's lightening fast, how = it > > has features other file systems do not have (like GRIO, real time > > volumes, allocate on flush, etc), how it scales very well, etc... but > > what I didn't hear about is how fast XFS screws things up if something > > wrong happens. Because of the good things I heard about XFS, I too > > decided to try it out (been using Ext3 or ReiserFS here for most of the > > time). Now I'm very disappointed in XFS. I live in an area where power > > outages are common and I do not have an UPS here. I have a few computers > > all running on XFS and thought that XFS will give me similar > > data-integrity like Ext3 or ReiserFS. Now, for the past few weeks I've > > been experiencing "strange behavior" from XFS. One time, I was reading = an > > article on the Net and had only my Firefox browser open. Then we had a > > power outage fo