From owner-linux-xfs@oss.sgi.com Mon May 1 02:50:13 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Mon, 01 May 2006 02:50:16 -0700 (PDT) Received: from mail.ukfsn.org (s2.ukfsn.org [217.158.120.143]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with ESMTP id k419mBn2007741 for ; Mon, 1 May 2006 02:50:12 -0700 Received: from localhost.localdomain (i-83-67-36-194.freedom2surf.net [83.67.36.194]) by mail.ukfsn.org (Postfix) with ESMTP id 1A858E7077; Mon, 1 May 2006 10:37:16 +0100 (BST) Received: from [10.0.0.90] (helo=[10.0.0.90]) by localhost.localdomain with esmtp (Exim 4.50) id 1FaUvM-0006NO-A8; Mon, 01 May 2006 10:42:36 +0100 Message-ID: <4455D7E7.1040203@dgreaves.com> Date: Mon, 01 May 2006 10:41:59 +0100 From: David Greaves User-Agent: Mail/News 1.5 (X11/20060228) MIME-Version: 1.0 To: Nathan Scott Cc: "'linux-kernel@vger.kernel.org'" , linux-xfs@oss.sgi.com, nickpiggin@yahoo.com.au Subject: Re: Bad page state in process 'nfsd' with xfs References: <4452797F.70700@dgreaves.com> <20060501080427.H1771752@wobbly.melbourne.sgi.com> In-Reply-To: <20060501080427.H1771752@wobbly.melbourne.sgi.com> X-Enigmail-Version: 0.94.0.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-archive-position: 7701 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: david@dgreaves.com Precedence: bulk X-list: linux-xfs -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Nathan Scott wrote: > Hi there, > > On Fri, Apr 28, 2006 at 09:22:23PM +0100, David Greaves wrote: > > But, the warning is triggered by the page count (16777216 above), and > that is 0x1000000 -- which is a huge, improbable count; that looks to > me like it could very well be the result of a single bit error too. > > You may have a hardware problem - try running memtest I guess. Thanks guys It's in use a lot so I'll schedule some downtime, blow out the dust and run memtest (though I've done that before and it has been clean). I'll let you know how it goes... David - -- -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFEVdfn8LvjTle4P1gRAiHTAKCBakrWQCpHgo8qyfN6ZNryAxi3bQCdFkDn vQe781l5bQvq1a5BG2nF5sk= =jdAy -----END PGP SIGNATURE----- From owner-linux-xfs@oss.sgi.com Mon May 1 08:29:17 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Mon, 01 May 2006 08:29:19 -0700 (PDT) Received: from smtp107.sbc.mail.mud.yahoo.com (smtp107.sbc.mail.mud.yahoo.com [68.142.198.206]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with SMTP id k41FRE3P014076 for ; Mon, 1 May 2006 08:29:15 -0700 Received: (qmail 59252 invoked from network); 1 May 2006 15:21:39 -0000 Received: from unknown (HELO stupidest.org) (cwedgwood@sbcglobal.net@70.132.14.41 with login) by smtp107.sbc.mail.mud.yahoo.com with SMTP; 1 May 2006 15:21:39 -0000 Received: by taniwha.stupidest.org (Postfix, from userid 38689) id F3FB651FAC3; Mon, 1 May 2006 08:21:37 -0700 (PDT) Date: Mon, 1 May 2006 08:21:37 -0700 From: Chris Wedgwood To: David Greaves Cc: Nathan Scott , "'linux-kernel@vger.kernel.org'" , linux-xfs@oss.sgi.com, nickpiggin@yahoo.com.au Subject: Re: Bad page state in process 'nfsd' with xfs Message-ID: <20060501152137.GB24771@taniwha.stupidest.org> References: <4452797F.70700@dgreaves.com> <20060501080427.H1771752@wobbly.melbourne.sgi.com> <4455D7E7.1040203@dgreaves.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4455D7E7.1040203@dgreaves.com> X-archive-position: 7703 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: cw@f00f.org Precedence: bulk X-list: linux-xfs Content-Length: 401 Lines: 13 On Mon, May 01, 2006 at 10:41:59AM +0100, David Greaves wrote: > It's in use a lot so I'll schedule some downtime, blow out the dust > and run memtest (though I've done that before and it has been > clean). memtest doesn't always find bad memory sadly finding bad memory is hard, and sometimes it's exacerbated by complicated factors (heat from drives for example) i wish ecc memory was standard From owner-linux-xfs@oss.sgi.com Mon May 1 18:06:18 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Mon, 01 May 2006 18:06:21 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with SMTP id k4214E4g009518 for ; Mon, 1 May 2006 18:06:16 -0700 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA06840 for ; Tue, 2 May 2006 09:47:39 +1000 Received: by chook.melbourne.sgi.com (Postfix, from userid 16302) id 4826249AC116; Tue, 2 May 2006 09:47:38 +1000 (EST) To: linux-xfs@oss.sgi.com Subject: TAKE 907752 - acl Message-Id: <20060501234738.4826249AC116@chook.melbourne.sgi.com> Date: Tue, 2 May 2006 09:47:38 +1000 (EST) From: nathans@sgi.com (Nathan Scott) X-archive-position: 7704 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: nathans@sgi.com Precedence: bulk X-list: linux-xfs Content-Length: 1074 Lines: 23 Merge simple nftw-vs-symlinks handling fixes, based on suggestions from Andreas. Date: Tue May 2 09:46:43 AEST 2006 Workarea: chook.melbourne.sgi.com:/build/nathans/xfs-cmds Inspected by: nathans,agruen@suse.de The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/xfs-cmds/master-melb Modid: master-melb:xfs-cmds:25861a acl/VERSION - 1.77 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/acl/VERSION.diff?r1=text&tr1=1.77&r2=text&tr2=1.76&f=h acl/doc/CHANGES - 1.86 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/acl/doc/CHANGES.diff?r1=text&tr1=1.86&r2=text&tr2=1.85&f=h acl/debian/changelog - 1.74 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/acl/debian/changelog.diff?r1=text&tr1=1.74&r2=text&tr2=1.73&f=h acl/setfacl/setfacl.c - 1.18 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/acl/setfacl/setfacl.c.diff?r1=text&tr1=1.18&r2=text&tr2=1.17&f=h acl/getfacl/getfacl.c - 1.18 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/acl/getfacl/getfacl.c.diff?r1=text&tr1=1.18&r2=text&tr2=1.17&f=h From owner-linux-xfs@oss.sgi.com Thu May 4 14:23:32 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Thu, 04 May 2006 14:23:39 -0700 (PDT) Received: from orca.ele.uri.edu (orca.ele.uri.edu [131.128.51.63]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with ESMTP id k44LLQ54026179 for ; Thu, 4 May 2006 14:23:28 -0700 Received: from [192.168.1.4] (c-71-232-42-50.hsd1.ma.comcast.net [71.232.42.50]) (authenticated bits=0) by orca.ele.uri.edu (8.13.4/8.13.4) with ESMTP id k44LFfHN013440 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Thu, 4 May 2006 17:15:41 -0400 Subject: multiple write stream performance From: Ming Zhang Reply-To: mingz@ele.uri.edu To: xfs Content-Type: text/plain Date: Thu, 04 May 2006 17:15:35 -0400 Message-Id: <1146777335.3609.173.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.6.1 (2.6.1-1.fc5.2) Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.52 on 131.128.51.63 X-archive-position: 7711 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: mingz@ele.uri.edu Precedence: bulk X-list: linux-xfs Content-Length: 1131 Lines: 33 Hi, all I have a 8*300GB DISK RAID0 used to hold temporary large size media files. Usually application will write those ~10GB files to it sequentially. Now I found that if I have one file write to it, I can get like ~260MB/s, but if i have 4 concurrent file write, i can only get aggregated 192MB/s, with 16 concurrent writes, the aggregated throughput becomes ~100MB/s. Anybody know why I got such a bad write performance? I guess it is because of seek back and forth. This shows that spaces are still allocated to file with large chunks. thus lead to the seek when writing different files. but why xfs can not allocate space better? [root@dualxeon bonnie++-1.03a]# xfs_bmap /tmp/t/v8 /tmp/t/v8: 0: [0..49279]: 336480..385759 1: [49280..192127]: 39321664..39464511 2: [192128..229887]: 39485504..39523263 3: [229888..267391]: 39571904..39609407 4: [267392..590207]: 52509888..52832703 5: [590208..620671]: 52847168..52877631 6: [620672..663807]: 91995584..92038719 7: [663808..677503]: 92098112..92111807 8: [677504..691327]: 92130624..92144447 Ming From owner-linux-xfs@oss.sgi.com Thu May 4 16:39:49 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Thu, 04 May 2006 16:39:52 -0700 (PDT) Received: from orca.ele.uri.edu (orca.ele.uri.edu [131.128.51.63]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with ESMTP id k44NblD9011558 for ; Thu, 4 May 2006 16:39:49 -0700 Received: from [192.168.1.4] (c-71-232-42-50.hsd1.ma.comcast.net [71.232.42.50]) (authenticated bits=0) by orca.ele.uri.edu (8.13.4/8.13.4) with ESMTP id k44NW7Ie021389 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Thu, 4 May 2006 19:32:08 -0400 Subject: Re: multiple write stream performance From: Ming Zhang Reply-To: mingz@ele.uri.edu To: chatz@melbourne.sgi.com Cc: xfs In-Reply-To: <445A8112.7050803@melbourne.sgi.com> References: <1146777335.3609.173.camel@localhost.localdomain> <445A8112.7050803@melbourne.sgi.com> Content-Type: text/plain Date: Thu, 04 May 2006 19:32:01 -0400 Message-Id: <1146785522.3609.186.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.6.1 (2.6.1-1.fc5.2) Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.52 on 131.128.51.63 X-archive-position: 7712 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: mingz@ele.uri.edu Precedence: bulk X-list: linux-xfs Content-Length: 3270 Lines: 87 On Fri, 2006-05-05 at 08:32 +1000, David Chatterton wrote: > Ming, > > What are the I/O characteristics of the application? Typically I > have seen direct I/O for video data at reasonable sizes, and > smaller buffered I/O for audio data in media apps. In the > worse case they mix buffered and direct to the same file. The > larger the I/O requests the better in terms of reducing > fragmentation. I feel that I here want the fragmentation. I will have 10-20 large size (~10GB) multimedia files write at same time to this RAID0. then later a background program will dump them to tape. so i want the concurrent write to be as soon as possible. so if xfs allocate 0 ~ (16MB-512) to file1, 16MB ~ (32MB-51) file2,..., then when write to file 1 to file N concurrently. the disk heads have to move back and forth among these places and thus leave the the poor performance i saw. ps, what u mean DDN, the full name is ___? ming > > Some applications take advantage of the preallocation APIs and > know that they are ingesting X GBs, and preallocate that space. > This may still be fragmented, but in most circumstnaces the > fragmentation is far less than without preallocation. > > Performance degrading with multiple writers is not unexpected > if they are jumping around a lot, and there is limited cache > of the controller etc. That is why for customers with demanding > media workloads we recommend storage like DDN that have very > large caches and can absorb lots of streams. But that costs > a lot more than a jbod! > > Coming soon we will introduce to XFS on linux a new mount option > that will put writers to files in different directories into > different allocation groups. If you only have one writer per > directory, then fragementation in those files can be significantly > better since the writers aren't fighting for space in the same > region of the filesystem. That will help here but I'm not sure > it will solve your problem. > > Thanks, > > David > > > Ming Zhang wrote: > > Hi, all > > > > I have a 8*300GB DISK RAID0 used to hold temporary large size media > > files. Usually application will write those ~10GB files to it > > sequentially. > > > > Now I found that if I have one file write to it, I can get like > > ~260MB/s, but if i have 4 concurrent file write, i can only get > > aggregated 192MB/s, with 16 concurrent writes, the aggregated throughput > > becomes ~100MB/s. > > > > Anybody know why I got such a bad write performance? I guess it is > > because of seek back and forth. > > > > This shows that spaces are still allocated to file with large chunks. > > thus lead to the seek when writing different files. but why xfs can not > > allocate space better? > > > > [root@dualxeon bonnie++-1.03a]# xfs_bmap /tmp/t/v8 > > /tmp/t/v8: > > 0: [0..49279]: 336480..385759 > > 1: [49280..192127]: 39321664..39464511 > > 2: [192128..229887]: 39485504..39523263 > > 3: [229888..267391]: 39571904..39609407 > > 4: [267392..590207]: 52509888..52832703 > > 5: [590208..620671]: 52847168..52877631 > > 6: [620672..663807]: 91995584..92038719 > > 7: [663808..677503]: 92098112..92111807 > > 8: [677504..691327]: 92130624..92144447 > > > > Ming > > > > > From owner-linux-xfs@oss.sgi.com Thu May 4 18:49:09 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Thu, 04 May 2006 18:49:13 -0700 (PDT) Received: from orca.ele.uri.edu (orca.ele.uri.edu [131.128.51.63]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with ESMTP id k451l7B4023565 for ; Thu, 4 May 2006 18:49:09 -0700 Received: from [192.168.1.4] (c-71-232-42-50.hsd1.ma.comcast.net [71.232.42.50]) (authenticated bits=0) by orca.ele.uri.edu (8.13.4/8.13.4) with ESMTP id k451fU56023604 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Thu, 4 May 2006 21:41:30 -0400 Subject: Re: multiple write stream performance From: Ming Zhang Reply-To: mingz@ele.uri.edu To: chatz@melbourne.sgi.com Cc: xfs In-Reply-To: <1146785522.3609.186.camel@localhost.localdomain> References: <1146777335.3609.173.camel@localhost.localdomain> <445A8112.7050803@melbourne.sgi.com> <1146785522.3609.186.camel@localhost.localdomain> Content-Type: text/plain Date: Thu, 04 May 2006 21:38:09 -0400 Message-Id: <1146793089.3609.223.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.6.1 (2.6.1-1.fc5.2) Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.52 on 131.128.51.63 X-archive-position: 7713 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: mingz@ele.uri.edu Precedence: bulk X-list: linux-xfs Content-Length: 3700 Lines: 100 Hi David Or we put fragmentation issue aside first. How could I allow multiple write streams to come in concurrently and get full speed potential by avoiding seek as much as possible? Thanks! Ming On Thu, 2006-05-04 at 19:32 -0400, Ming Zhang wrote: > On Fri, 2006-05-05 at 08:32 +1000, David Chatterton wrote: > > Ming, > > > > What are the I/O characteristics of the application? Typically I > > have seen direct I/O for video data at reasonable sizes, and > > smaller buffered I/O for audio data in media apps. In the > > worse case they mix buffered and direct to the same file. The > > larger the I/O requests the better in terms of reducing > > fragmentation. > > I feel that I here want the fragmentation. I will have 10-20 large size > (~10GB) multimedia files write at same time to this RAID0. then later a > background program will dump them to tape. so i want the concurrent > write to be as soon as possible. > > so if xfs allocate 0 ~ (16MB-512) to file1, 16MB ~ (32MB-51) file2,..., > then when write to file 1 to file N concurrently. the disk heads have to > move back and forth among these places and thus leave the the poor > performance i saw. > > ps, what u mean DDN, the full name is ___? > > ming > > > > > > Some applications take advantage of the preallocation APIs and > > know that they are ingesting X GBs, and preallocate that space. > > This may still be fragmented, but in most circumstnaces the > > fragmentation is far less than without preallocation. > > > > Performance degrading with multiple writers is not unexpected > > if they are jumping around a lot, and there is limited cache > > of the controller etc. That is why for customers with demanding > > media workloads we recommend storage like DDN that have very > > large caches and can absorb lots of streams. But that costs > > a lot more than a jbod! > > > > Coming soon we will introduce to XFS on linux a new mount option > > that will put writers to files in different directories into > > different allocation groups. If you only have one writer per > > directory, then fragementation in those files can be significantly > > better since the writers aren't fighting for space in the same > > region of the filesystem. That will help here but I'm not sure > > it will solve your problem. > > > > Thanks, > > > > David > > > > > > Ming Zhang wrote: > > > Hi, all > > > > > > I have a 8*300GB DISK RAID0 used to hold temporary large size media > > > files. Usually application will write those ~10GB files to it > > > sequentially. > > > > > > Now I found that if I have one file write to it, I can get like > > > ~260MB/s, but if i have 4 concurrent file write, i can only get > > > aggregated 192MB/s, with 16 concurrent writes, the aggregated throughput > > > becomes ~100MB/s. > > > > > > Anybody know why I got such a bad write performance? I guess it is > > > because of seek back and forth. > > > > > > This shows that spaces are still allocated to file with large chunks. > > > thus lead to the seek when writing different files. but why xfs can not > > > allocate space better? > > > > > > [root@dualxeon bonnie++-1.03a]# xfs_bmap /tmp/t/v8 > > > /tmp/t/v8: > > > 0: [0..49279]: 336480..385759 > > > 1: [49280..192127]: 39321664..39464511 > > > 2: [192128..229887]: 39485504..39523263 > > > 3: [229888..267391]: 39571904..39609407 > > > 4: [267392..590207]: 52509888..52832703 > > > 5: [590208..620671]: 52847168..52877631 > > > 6: [620672..663807]: 91995584..92038719 > > > 7: [663808..677503]: 92098112..92111807 > > > 8: [677504..691327]: 92130624..92144447 > > > > > > Ming > > > > > > > > > From owner-linux-xfs@oss.sgi.com Thu May 4 19:11:21 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Thu, 04 May 2006 19:11:23 -0700 (PDT) Received: from omx1.americas.sgi.com (omx1.americas.sgi.com [198.149.16.13]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with ESMTP id k4529ZDf026100 for ; Thu, 4 May 2006 19:11:21 -0700 Received: from internal-mail-relay1.corp.sgi.com (internal-mail-relay1.corp.sgi.com [198.149.32.52]) by omx1.americas.sgi.com (8.12.10/8.12.9/linux-outbound_gateway-1.1) with ESMTP id k44MZAnx007500 for ; Thu, 4 May 2006 17:35:10 -0500 Received: from outhouse.melbourne.sgi.com (outhouse.melbourne.sgi.com [134.14.52.145]) by internal-mail-relay1.corp.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id k44MX58s3059014; Thu, 4 May 2006 15:33:08 -0700 (PDT) Received: from [134.14.52.212] (shiva212.melbourne.sgi.com [134.14.52.212]) by outhouse.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id k44MX0rV1980866; Fri, 5 May 2006 08:33:01 +1000 (AEST) Message-ID: <445A8112.7050803@melbourne.sgi.com> Date: Fri, 05 May 2006 08:32:50 +1000 From: David Chatterton Reply-To: chatz@melbourne.sgi.com Organization: SGI User-Agent: Mozilla Thunderbird 1.0.6 (Windows/20050716) X-Accept-Language: en-us, en MIME-Version: 1.0 To: mingz@ele.uri.edu CC: xfs Subject: Re: multiple write stream performance References: <1146777335.3609.173.camel@localhost.localdomain> In-Reply-To: <1146777335.3609.173.camel@localhost.localdomain> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 7714 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: chatz@melbourne.sgi.com Precedence: bulk X-list: linux-xfs Content-Length: 2542 Lines: 69 Ming, What are the I/O characteristics of the application? Typically I have seen direct I/O for video data at reasonable sizes, and smaller buffered I/O for audio data in media apps. In the worse case they mix buffered and direct to the same file. The larger the I/O requests the better in terms of reducing fragmentation. Some applications take advantage of the preallocation APIs and know that they are ingesting X GBs, and preallocate that space. This may still be fragmented, but in most circumstnaces the fragmentation is far less than without preallocation. Performance degrading with multiple writers is not unexpected if they are jumping around a lot, and there is limited cache of the controller etc. That is why for customers with demanding media workloads we recommend storage like DDN that have very large caches and can absorb lots of streams. But that costs a lot more than a jbod! Coming soon we will introduce to XFS on linux a new mount option that will put writers to files in different directories into different allocation groups. If you only have one writer per directory, then fragementation in those files can be significantly better since the writers aren't fighting for space in the same region of the filesystem. That will help here but I'm not sure it will solve your problem. Thanks, David Ming Zhang wrote: > Hi, all > > I have a 8*300GB DISK RAID0 used to hold temporary large size media > files. Usually application will write those ~10GB files to it > sequentially. > > Now I found that if I have one file write to it, I can get like > ~260MB/s, but if i have 4 concurrent file write, i can only get > aggregated 192MB/s, with 16 concurrent writes, the aggregated throughput > becomes ~100MB/s. > > Anybody know why I got such a bad write performance? I guess it is > because of seek back and forth. > > This shows that spaces are still allocated to file with large chunks. > thus lead to the seek when writing different files. but why xfs can not > allocate space better? > > [root@dualxeon bonnie++-1.03a]# xfs_bmap /tmp/t/v8 > /tmp/t/v8: > 0: [0..49279]: 336480..385759 > 1: [49280..192127]: 39321664..39464511 > 2: [192128..229887]: 39485504..39523263 > 3: [229888..267391]: 39571904..39609407 > 4: [267392..590207]: 52509888..52832703 > 5: [590208..620671]: 52847168..52877631 > 6: [620672..663807]: 91995584..92038719 > 7: [663808..677503]: 92098112..92111807 > 8: [677504..691327]: 92130624..92144447 > > Ming > > From owner-linux-xfs@oss.sgi.com Fri May 5 11:12:24 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Fri, 05 May 2006 11:12:27 -0700 (PDT) Received: from orca.ele.uri.edu (orca.ele.uri.edu [131.128.51.63]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with ESMTP id k45IAL1a020263 for ; Fri, 5 May 2006 11:12:23 -0700 Received: from [192.168.1.4] (c-71-232-42-50.hsd1.ma.comcast.net [71.232.42.50]) (authenticated bits=0) by orca.ele.uri.edu (8.13.4/8.13.4) with ESMTP id k45I4ivD024265 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Fri, 5 May 2006 14:04:45 -0400 Subject: how to get the metadata From: Ming Zhang Reply-To: mingz@ele.uri.edu To: xfs Content-Type: text/plain Date: Fri, 05 May 2006 14:04:39 -0400 Message-Id: <1146852279.3609.300.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.6.1 (2.6.1-1.fc5.2) Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.52 on 131.128.51.63 X-archive-position: 7717 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: mingz@ele.uri.edu Precedence: bulk X-list: linux-xfs Content-Length: 622 Lines: 20 Hi When mkfs.xfs, we can get this information. But how can i get this with a existing xfs? Thanks! meta-data=/dev/vg1/v1 isize=256 agcount=3, agsize=131072000 blks = sectsz=512 data = bsize=4096 blocks=393216000, imaxpct=25 = sunit=0 swidth=0 blks, unwritten=1 naming =version 2 bsize=4096 log =internal bsize=4096 blocks=32768, version=1 = sectsz=512 sunit=0 blks realtime =none extsz=65536 blocks=0, rtextents=0 Ming From owner-linux-xfs@oss.sgi.com Fri May 5 11:15:56 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Fri, 05 May 2006 11:16:00 -0700 (PDT) Received: from service.eng.exegy.net (68-191-203-42.static.stls.mo.charter.com [68.191.203.42]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with ESMTP id k45IDsiS020747 for ; Fri, 5 May 2006 11:15:56 -0700 Received: from HANAFORD.eng.exegy.net (hanaford.eng.exegy.net [10.19.1.4]) by service.eng.exegy.net (8.13.1/8.13.1) with ESMTP id k45I8Fxo015193 for ; Fri, 5 May 2006 13:08:15 -0500 Received: from [10.19.4.98] ([10.19.4.98]) by HANAFORD.eng.exegy.net with Microsoft SMTPSVC(6.0.3790.1830); Fri, 5 May 2006 13:08:15 -0500 Message-ID: <445B948F.7050203@exegy.com> Date: Fri, 05 May 2006 13:08:15 -0500 From: Dave Lloyd User-Agent: Thunderbird 1.5 (X11/20051201) MIME-Version: 1.0 To: xfs Subject: Re: how to get the metadata References: <1146852279.3609.300.camel@localhost.localdomain> In-Reply-To: <1146852279.3609.300.camel@localhost.localdomain> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 05 May 2006 18:08:15.0710 (UTC) FILETIME=[E168E7E0:01C6706E] X-archive-position: 7718 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: dlloyd@exegy.com Precedence: bulk X-list: linux-xfs Content-Length: 779 Lines: 32 Ming Zhang wrote: > Hi > > When mkfs.xfs, we can get this information. But how can i get this with > a existing xfs? Thanks! > > meta-data=/dev/vg1/v1 isize=256 agcount=3, > agsize=131072000 blks > = sectsz=512 > data = bsize=4096 blocks=393216000, > imaxpct=25 > = sunit=0 swidth=0 blks, unwritten=1 > naming =version 2 bsize=4096 > log =internal bsize=4096 blocks=32768, version=1 > = sectsz=512 sunit=0 blks > realtime =none extsz=65536 blocks=0, rtextents=0 > > > Ming > > > > Try xfs_growfs -n -- Dave Lloyd Test Engineer, Exegy, Inc. 314.450.5342 dlloyd@exegy.com From owner-linux-xfs@oss.sgi.com Fri May 5 12:44:11 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Fri, 05 May 2006 12:45:13 -0700 (PDT) Received: from orca.ele.uri.edu (orca.ele.uri.edu [131.128.51.63]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with ESMTP id k45JgApZ003074 for ; Fri, 5 May 2006 12:44:11 -0700 Received: from [192.168.1.4] (c-71-232-42-50.hsd1.ma.comcast.net [71.232.42.50]) (authenticated bits=0) by orca.ele.uri.edu (8.13.4/8.13.4) with ESMTP id k45JaOCb020482 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Fri, 5 May 2006 15:36:25 -0400 Subject: Re: how to get the metadata From: Ming Zhang Reply-To: mingz@ele.uri.edu To: Dave Lloyd Cc: xfs In-Reply-To: <445B948F.7050203@exegy.com> References: <1146852279.3609.300.camel@localhost.localdomain> <445B948F.7050203@exegy.com> Content-Type: text/plain Date: Fri, 05 May 2006 15:36:19 -0400 Message-Id: <1146857779.3609.306.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.6.1 (2.6.1-1.fc5.2) Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.52 on 131.128.51.63 X-archive-position: 7719 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: mingz@ele.uri.edu Precedence: bulk X-list: linux-xfs Content-Length: 1033 Lines: 37 o, yes. u are right. thx! -n Specifies that no change to the filesystem is to be made. The filesystem geometry is printed, and argument checking is performed, but no growth occurs. ming On Fri, 2006-05-05 at 13:08 -0500, Dave Lloyd wrote: > Ming Zhang wrote: > > Hi > > > > When mkfs.xfs, we can get this information. But how can i get this with > > a existing xfs? Thanks! > > > > meta-data=/dev/vg1/v1 isize=256 agcount=3, > > agsize=131072000 blks > > = sectsz=512 > > data = bsize=4096 blocks=393216000, > > imaxpct=25 > > = sunit=0 swidth=0 blks, unwritten=1 > > naming =version 2 bsize=4096 > > log =internal bsize=4096 blocks=32768, version=1 > > = sectsz=512 sunit=0 blks > > realtime =none extsz=65536 blocks=0, rtextents=0 > > > > > > Ming > > > > > > > > > > Try xfs_growfs -n > From owner-linux-xfs@oss.sgi.com Mon May 8 04:20:17 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Mon, 08 May 2006 04:20:21 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with SMTP id k48BICUm013943 for ; Mon, 8 May 2006 04:20:16 -0700 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id VAA06042; Mon, 8 May 2006 21:12:27 +1000 Received: by chook.melbourne.sgi.com (Postfix, from userid 16302) id 78E234A588CB; Mon, 8 May 2006 21:12:25 +1000 (EST) To: linux-xfs@oss.sgi.com, sgi.bugs.xfs@engr.sgi.com Subject: TAKE 952681 - fix agfl refcount leak Message-Id: <20060508111225.78E234A588CB@chook.melbourne.sgi.com> Date: Mon, 8 May 2006 21:12:25 +1000 (EST) From: nathans@sgi.com (Nathan Scott) X-archive-position: 7720 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: nathans@sgi.com Precedence: bulk X-list: linux-xfs Content-Length: 483 Lines: 14 Fix a possible metadata buffer (AGFL) refcount leak when fixing an AG freelist. Date: Mon May 8 21:11:41 AEST 2006 Workarea: chook.melbourne.sgi.com:/build/nathans/xfs-linux Inspected by: tes@sgi.com The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/xfs-kern/xfs-linux-melb Modid: xfs-linux-melb:xfs-kern:25902a xfs_alloc.c - 1.179 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_alloc.c.diff?r1=text&tr1=1.179&r2=text&tr2=1.178&f=h From owner-linux-xfs@oss.sgi.com Thu May 11 23:15:29 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Thu, 11 May 2006 23:15:38 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with SMTP id k4C6DP6I009347 for ; Thu, 11 May 2006 23:15:27 -0700 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA00076; Fri, 12 May 2006 16:07:42 +1000 Received: by chook.melbourne.sgi.com (Postfix, from userid 16302) id 83F484A588E2; Fri, 12 May 2006 16:07:35 +1000 (EST) To: linux-xfs@oss.sgi.com, sgi.bugs.xfs@engr.sgi.com Subject: TAKE 952736 - fix noatime for mmap Message-Id: <20060512060735.83F484A588E2@chook.melbourne.sgi.com> Date: Fri, 12 May 2006 16:07:35 +1000 (EST) From: nathans@sgi.com (Nathan Scott) X-archive-position: 7724 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: nathans@sgi.com Precedence: bulk X-list: linux-xfs Content-Length: 651 Lines: 16 Fix a noatime regression related to updating inode atime field on mmap only. Date: Fri May 12 16:07:18 AEST 2006 Workarea: chook.melbourne.sgi.com:/build/nathans/xfs-linux Inspected by: sjv@sgi.com The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/xfs-kern/xfs-linux-melb Modid: xfs-linux-melb:xfs-kern:25922a linux-2.6/xfs_file.c - 1.134 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_file.c.diff?r1=text&tr1=1.134&r2=text&tr2=1.133&f=h linux-2.4/xfs_file.c - 1.124 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.4/xfs_file.c.diff?r1=text&tr1=1.124&r2=text&tr2=1.123&f=h From owner-linux-xfs@oss.sgi.com Fri May 12 14:07:27 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Fri, 12 May 2006 14:07:30 -0700 (PDT) Received: from g0.machinephasesystems.com (dsl092-191-029.sfo1.dsl.speakeasy.net [66.92.191.29]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with ESMTP id k4CL5NjH012609 for ; Fri, 12 May 2006 14:07:25 -0700 Received: from [192.168.1.67] (g.machinephasesystems.com [66.92.191.28]) by g0.machinephasesystems.com (8.13.6/8.13.6) with ESMTP id k4CJVO48011425 for ; Fri, 12 May 2006 12:31:24 -0700 Message-ID: <4464E3B5.8020602@wink.com> Date: Fri, 12 May 2006 12:36:21 -0700 From: Peter Broadwell User-Agent: Thunderbird 1.5.0.2 (X11/20060420) MIME-Version: 1.0 To: linux-xfs@oss.sgi.com Subject: Re: deep chmod|chown -R begin to start OOMkiller Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 7725 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: peter@wink.com Precedence: bulk X-list: linux-xfs Content-Length: 19221 Lines: 229 I seem to be having the same problem as CHIKAMA Masaki was having in December 7, 2005, namely "chown -R" running very slowly when hitting lots of files (~17 million in my case). My machine doesn't have the same constraints that David pointed to as at least part of the problem. I have fast disks, and lots of memory (though perhaps still bad logfile sizes) So I thought I'd feed into the discussion a bit, hoping for any other ideas... I'm most interested in anything to (safely) speed this up on a live file system as it has been running for nearly 24 hours so far... not hung or corrupted anything as far as I can tell. Following is possibly interesting info from uname, /proc/meminfo, /proc/slabinfo, ... (I don't have OOMkiller though): Thanks - ;;peter = = = = (start of info) = = = = peter@cl1 /data $ uname -sr Linux 2.6.14-gentoo-r2 peter@cl1 /data $ cat /proc/meminfo MemTotal: 8058120 kB MemFree: 2770704 kB Buffers: 12 kB Cached: 3412304 kB SwapCached: 6860 kB Active: 2914928 kB Inactive: 1673712 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 8058120 kB LowFree: 2770704 kB SwapTotal: 32129968 kB SwapFree: 32114220 kB Dirty: 16 kB Writeback: 0 kB Mapped: 1191804 kB Slab: 666680 kB CommitLimit: 36159028 kB Committed_AS: 1313628 kB PageTables: 4564 kB VmallocTotal: 34359738367 kB VmallocUsed: 24420 kB VmallocChunk: 34359713687 kB HugePages_Total: 0 HugePages_Free: 0 Hugepagesize: 2048 kB peter@cl1 /data $ cat /proc/slabinfo slabinfo - version: 2.1 # name : tunables : slabdata rpc_buffers 8 8 2048 2 1 : tunables 24 12 8 : slabdata 4 4 0 rpc_tasks 8 10 384 10 1 : tunables 54 27 8 : slabdata 1 1 0 rpc_inode_cache 8 12 832 4 1 : tunables 54 27 8 : slabdata 3 3 0 fib6_nodes 7 118 64 59 1 : tunables 120 60 8 : slabdata 2 2 0 ip6_dst_cache 7 24 320 12 1 : tunables 54 27 8 : slabdata 2 2 0 ndisc_cache 1 15 256 15 1 : tunables 120 60 8 : slabdata 1 1 0 RAWv6 4 4 896 4 1 : tunables 54 27 8 : slabdata 1 1 0 UDPv6 1 4 896 4 1 : tunables 54 27 8 : slabdata 1 1 0 tw_sock_TCPv6 0 0 192 20 1 : tunables 120 60 8 : slabdata 0 0 0 request_sock_TCPv6 0 0 128 30 1 : tunables 120 60 8 : slabdata 0 0 0 TCPv6 6 10 1536 5 2 : tunables 24 12 8 : slabdata 2 2 0 UNIX 41 54 640 6 1 : tunables 54 27 8 : slabdata 9 9 0 tcp_bind_bucket 34 448 32 112 1 : tunables 120 60 8 : slabdata 4 4 0 inet_peer_cache 0 0 64 59 1 : tunables 120 60 8 : slabdata 0 0 0 ip_fib_alias 14 118 64 59 1 : tunables 120 60 8 : slabdata 2 2 0 ip_fib_hash 14 118 64 59 1 : tunables 120 60 8 : slabdata 2 2 0 ip_dst_cache 36 48 320 12 1 : tunables 54 27 8 : slabdata 4 4 0 arp_cache 8 30 256 15 1 : tunables 120 60 8 : slabdata 2 2 0 RAW 3 11 704 11 2 : tunables 54 27 8 : slabdata 1 1 0 UDP 16 20 768 5 1 : tunables 54 27 8 : slabdata 4 4 0 tw_sock_TCP 23 40 192 20 1 : tunables 120 60 8 : slabdata 2 2 0 request_sock_TCP 8 30 128 30 1 : tunables 120 60 8 : slabdata 1 1 0 TCP 15 25 1408 5 2 : tunables 24 12 8 : slabdata 5 5 0 uhci_urb_priv 0 0 88 44 1 : tunables 120 60 8 : slabdata 0 0 0 scsi_cmd_cache 29 35 512 7 1 : tunables 54 27 8 : slabdata 5 5 0 cfq_ioc_pool 0 0 96 40 1 : tunables 120 60 8 : slabdata 0 0 0 cfq_pool 0 0 160 24 1 : tunables 120 60 8 : slabdata 0 0 0 crq_pool 0 0 88 44 1 : tunables 120 60 8 : slabdata 0 0 0 deadline_drq 607 760 96 40 1 : tunables 120 60 8 : slabdata 18 19 480 as_arq 0 0 112 34 1 : tunables 120 60 8 : slabdata 0 0 0 mqueue_inode_cache 1 4 896 4 1 : tunables 54 27 8 : slabdata 1 1 0 xfs_chashlist 205900 385952 32 112 1 : tunables 120 60 8 : slabdata 3446 3446 0 xfs_ili 273754 273760 192 20 1 : tunables 120 60 8 : slabdata 13688 13688 0 xfs_ifork 0 0 64 59 1 : tunables 120 60 8 : slabdata 0 0 0 xfs_efi_item 0 0 352 11 1 : tunables 54 27 8 : slabdata 0 0 0 xfs_efd_item 0 0 360 11 1 : tunables 54 27 8 : slabdata 0 0 0 xfs_buf_item 1 21 184 21 1 : tunables 120 60 8 : slabdata 1 1 0 xfs_dabuf 45 288 24 144 1 : tunables 120 60 8 : slabdata 2 2 0 xfs_da_state 0 0 488 8 1 : tunables 54 27 8 : slabdata 0 0 0 xfs_trans 186 351 872 9 2 : tunables 54 27 8 : slabdata 32 39 81 xfs_inode 275317 275317 528 7 1 : tunables 54 27 8 : slabdata 39331 39331 0 xfs_btree_cur 0 0 192 20 1 : tunables 120 60 8 : slabdata 0 0 0 xfs_bmap_free_item 0 0 24 144 1 : tunables 120 60 8 : slabdata 0 0 0 xfs_buf 288 414 408 9 1 : tunables 54 27 8 : slabdata 45 46 216 xfs_ioend 32 54 144 27 1 : tunables 120 60 8 : slabdata 2 2 0 xfs_vnode 275316 275316 632 6 1 : tunables 54 27 8 : slabdata 45886 45886 0 ntfs_big_inode_cache 0 0 896 4 1 : tunables 54 27 8 : slabdata 0 0 0 ntfs_inode_cache 0 0 272 14 1 : tunables 54 27 8 : slabdata 0 0 0 ntfs_name_cache 0 0 512 8 1 : tunables 54 27 8 : slabdata 0 0 0 ntfs_attr_ctx_cache 0 0 64 59 1 : tunables 120 60 8 : slabdata 0 0 0 ntfs_index_ctx_cache 0 0 128 30 1 : tunables 120 60 8 : slabdata 0 0 0 nfs_write_data 36 36 832 9 2 : tunables 54 27 8 : slabdata 4 4 0 nfs_read_data 32 35 768 5 1 : tunables 54 27 8 : slabdata 7 7 0 nfs_inode_cache 1 4 912 4 1 : tunables 54 27 8 : slabdata 1 1 0 nfs_page 0 0 128 30 1 : tunables 120 60 8 : slabdata 0 0 0 isofs_inode_cache 0 0 632 6 1 : tunables 54 27 8 : slabdata 0 0 0 fat_inode_cache 0 0 664 6 1 : tunables 54 27 8 : slabdata 0 0 0 fat_cache 0 0 32 112 1 : tunables 120 60 8 : slabdata 0 0 0 hugetlbfs_inode_cache 1 6 600 6 1 : tunables 54 27 8 : slabdata 1 1 0 ext2_inode_cache 0 0 744 5 1 : tunables 54 27 8 : slabdata 0 0 0 ext2_xattr 0 0 88 44 1 : tunables 120 60 8 : slabdata 0 0 0 journal_handle 0 0 24 144 1 : tunables 120 60 8 : slabdata 0 0 0 journal_head 0 0 96 40 1 : tunables 120 60 8 : slabdata 0 0 0 revoke_table 0 0 16 202 1 : tunables 120 60 8 : slabdata 0 0 0 revoke_record 0 0 32 112 1 : tunables 120 60 8 : slabdata 0 0 0 ext3_inode_cache 0 0 792 5 1 : tunables 54 27 8 : slabdata 0 0 0 ext3_xattr 0 0 88 44 1 : tunables 120 60 8 : slabdata 0 0 0 reiser_inode_cache 0 0 704 5 1 : tunables 54 27 8 : slabdata 0 0 0 dnotify_cache 0 0 40 92 1 : tunables 120 60 8 : slabdata 0 0 0 eventpoll_pwq 0 0 72 53 1 : tunables 120 60 8 : slabdata 0 0 0 eventpoll_epi 0 0 192 20 1 : tunables 120 60 8 : slabdata 0 0 0 inotify_event_cache 0 0 40 92 1 : tunables 120 60 8 : slabdata 0 0 0 inotify_watch_cache 1 59 64 59 1 : tunables 120 60 8 : slabdata 1 1 0 kioctx 0 0 320 12 1 : tunables 54 27 8 : slabdata 0 0 0 kiocb 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0 fasync_cache 0 0 24 144 1 : tunables 120 60 8 : slabdata 0 0 0 shmem_inode_cache 840 850 792 5 1 : tunables 54 27 8 : slabdata 170 170 0 posix_timers_cache 0 0 168 23 1 : tunables 120 60 8 : slabdata 0 0 0 uid_cache 9 118 64 59 1 : tunables 120 60 8 : slabdata 2 2 0 sgpool-128 32 32 4096 1 1 : tunables 24 12 8 : slabdata 32 32 0 sgpool-64 32 32 2048 2 1 : tunables 24 12 8 : slabdata 16 16 0 sgpool-32 32 32 1024 4 1 : tunables 54 27 8 : slabdata 8 8 0 sgpool-16 45 48 512 8 1 : tunables 54 27 8 : slabdata 6 6 0 sgpool-8 52 60 256 15 1 : tunables 120 60 8 : slabdata 4 4 0 blkdev_ioc 114 201 56 67 1 : tunables 120 60 8 : slabdata 3 3 0 blkdev_queue 31 44 712 11 2 : tunables 54 27 8 : slabdata 4 4 0 blkdev_requests 311 630 264 15 1 : tunables 54 27 8 : slabdata 40 42 216 biovec-(256) 256 256 4096 1 1 : tunables 24 12 8 : slabdata 256 256 0 biovec-128 256 256 2048 2 1 : tunables 24 12 8 : slabdata 128 128 0 biovec-64 256 256 1024 4 1 : tunables 54 27 8 : slabdata 64 64 0 biovec-16 285 285 256 15 1 : tunables 120 60 8 : slabdata 19 19 0 biovec-4 864 1652 64 59 1 : tunables 120 60 8 : slabdata 27 28 480 biovec-1 482 1616 16 202 1 : tunables 120 60 8 : slabdata 8 8 108 bio 860 1500 128 30 1 : tunables 120 60 8 : slabdata 50 50 480 file_lock_cache 6 24 160 24 1 : tunables 120 60 8 : slabdata 1 1 0 sock_inode_cache 93 130 704 5 1 : tunables 54 27 8 : slabdata 26 26 0 skbuff_fclone_cache 20 32 448 8 1 : tunables 54 27 8 : slabdata 3 4 0 skbuff_head_cache 555 1035 256 15 1 : tunables 120 60 8 : slabdata 69 69 0 acpi_operand 1127 1166 72 53 1 : tunables 120 60 8 : slabdata 22 22 0 acpi_parse_ext 0 0 64 59 1 : tunables 120 60 8 : slabdata 0 0 0 acpi_parse 0 0 40 92 1 : tunables 120 60 8 : slabdata 0 0 0 acpi_state 0 0 88 44 1 : tunables 120 60 8 : slabdata 0 0 0 proc_inode_cache 667 690 616 6 1 : tunables 54 27 8 : slabdata 115 115 0 sigqueue 32 46 168 23 1 : tunables 120 60 8 : slabdata 2 2 0 radix_tree_node 232625 303359 536 7 1 : tunables 54 27 8 : slabdata 43337 43337 0 bdev_cache 22 28 832 4 1 : tunables 54 27 8 : slabdata 7 7 0 sysfs_dir_cache 2946 3021 72 53 1 : tunables 120 60 8 : slabdata 57 57 0 mnt_cache 26 60 192 20 1 : tunables 120 60 8 : slabdata 3 3 0 inode_cache 1080 1085 584 7 1 : tunables 54 27 8 : slabdata 155 155 0 dentry_cache 252909 252909 224 17 1 : tunables 120 60 8 : slabdata 14877 14877 0 filp 883 1365 256 15 1 : tunables 120 60 8 : slabdata 91 91 0 names_cache 3 5 4096 1 1 : tunables 24 12 8 : slabdata 3 5 0 idr_layer_cache 77 84 528 7 1 : tunables 54 27 8 : slabdata 12 12 0 buffer_head 52111 139612 88 44 1 : tunables 120 60 8 : slabdata 3173 3173 0 mm_struct 67 77 1152 7 2 : tunables 24 12 8 : slabdata 11 11 0 vm_area_struct 2672 2814 184 21 1 : tunables 120 60 8 : slabdata 134 134 0 fs_cache 76 118 64 59 1 : tunables 120 60 8 : slabdata 2 2 0 files_cache 66 72 896 4 1 : tunables 54 27 8 : slabdata 18 18 0 signal_cache 109 120 640 6 1 : tunables 54 27 8 : slabdata 20 20 0 sighand_cache 103 108 2112 3 2 : tunables 24 12 8 : slabdata 36 36 0 task_struct 123 128 1728 4 2 : tunables 24 12 8 : slabdata 32 32 0 anon_vma 987 1440 24 144 1 : tunables 120 60 8 : slabdata 10 10 0 shared_policy_node 0 0 56 67 1 : tunables 120 60 8 : slabdata 0 0 0 numa_policy 39 404 16 202 1 : tunables 120 60 8 : slabdata 2 2 0 size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-131072 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0 size-65536 2 2 65536 1 16 : tunables 8 4 0 : slabdata 2 2 0 size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0 size-32768 20 20 32768 1 8 : tunables 8 4 0 : slabdata 20 20 0 size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0 size-16384 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0 size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0 size-8192 17 17 8192 1 2 : tunables 8 4 0 : slabdata 17 17 0 size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 8 : slabdata 0 0 0 size-4096 269 270 4096 1 1 : tunables 24 12 8 : slabdata 269 270 0 size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 8 : slabdata 0 0 0 size-2048 708 736 2048 2 1 : tunables 24 12 8 : slabdata 363 368 0 size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 8 : slabdata 0 0 0 size-1024 350 368 1024 4 1 : tunables 54 27 8 : slabdata 92 92 0 size-512(DMA) 0 0 512 8 1 : tunables 54 27 8 : slabdata 0 0 0 size-512 619 640 512 8 1 : tunables 54 27 8 : slabdata 80 80 0 size-256(DMA) 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0 size-256 82 105 256 15 1 : tunables 120 60 8 : slabdata 7 7 0 size-192(DMA) 0 0 192 20 1 : tunables 120 60 8 : slabdata 0 0 0 size-192 1560 2000 192 20 1 : tunables 120 60 8 : slabdata 100 100 0 size-128(DMA) 0 0 128 30 1 : tunables 120 60 8 : slabdata 0 0 0 size-64(DMA) 0 0 64 59 1 : tunables 120 60 8 : slabdata 0 0 0 size-64 2672 9027 64 59 1 : tunables 120 60 8 : slabdata 153 153 0 size-32(DMA) 0 0 32 112 1 : tunables 120 60 8 : slabdata 0 0 0 size-128 3807 4950 128 30 1 : tunables 120 60 8 : slabdata 165 165 300 size-32 703 784 32 112 1 : tunables 120 60 8 : slabdata 7 7 0 kmem_cache 155 155 704 5 1 : tunables 54 27 8 : slabdata 31 31 0 peter@cl1 /data $ peter@cl1 /data $ df -k Filesystem 1K-blocks Used Available Use% Mounted on /dev/md/0 22479104 13991500 8487604 63% / udev 4029060 244 4028816 1% /dev /dev/md/1 449447808 338816792 110631016 76% /data none 4029060 0 4029060 0% /dev/shm cl4:/data 451279232 112298760 338980472 25% /mnt/cl4-data peter@cl1 /data $ xfs_info /data meta-data=/dev/md1 isize=256 agcount=16, agsize=7024672 blks = sectsz=512 data = bsize=4096 blocks=112394720, imaxpct=25 = sunit=16 swidth=64 blks, unwritten=1 naming =version 2 bsize=4096 log =internal bsize=4096 blocks=32768, version=1 = sectsz=512 sunit=0 blks realtime =none extsz=262144 blocks=0, rtextents=0 peter@cl1 /data $ = = = = (end of info) = = = = From owner-linux-xfs@oss.sgi.com Mon May 15 02:42:46 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Mon, 15 May 2006 02:42:49 -0700 (PDT) Received: from mta1.gsf.de (mta1.gsf.de [146.107.3.111]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with ESMTP id k4F9ehrv016174 for ; Mon, 15 May 2006 02:42:45 -0700 Received: from [127.0.0.1] (acouchis.gsf.de [146.107.217.183]) by mta1.gsf.de (Postfix) with ESMTP id 19D7A5388C for ; Mon, 15 May 2006 08:57:45 +0200 (CEST) Message-ID: <4468263B.6080609@gsf.de> Date: Mon, 15 May 2006 08:56:59 +0200 From: Yogesh Bhanu Reply-To: yogesh@gsf.de Organization: IBI/MIPS User-Agent: Thunderbird 1.5.0.2 (X11/20060308) MIME-Version: 1.0 To: linux-xfs@oss.sgi.com Subject: xfs_repair failing Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 7731 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: yogesh@gsf.de Precedence: bulk X-list: linux-xfs Content-Length: 1128 Lines: 28 Hi all , My SLES9 xfs file server has runinto some problems, It started with storage acting flaky, though xfs had switched the filesystem inquestion offline. After I have fixed the problem on the storage. I tried running xfs_repair.(Before that I have mounted the file system in question). The system ends up with the following error message. xfs_repair -v -n /dev/vg01/data01 Phase 1 - find and verify superblock... Phase 2 - using internal log - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 imap claims a free inode 292381 is in use, would correct imap and clear inode xfs_repair: read failed: Input/output error cannot read inode 292448, disk block 146224, cnt 32 Any Ideas , to fix the error or should I head for backups. System configuration . SLES 9 with SP3 on AMD Opteron (246) Dual Processor system with 8 GB RAM . Storage is connected (DS4300) by FC and uses LVM to mount 3.5 TB volume. From owner-linux-xfs@oss.sgi.com Mon May 15 03:05:52 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Mon, 15 May 2006 03:05:55 -0700 (PDT) Received: from lucidpixels.com (lucidpixels.com [66.45.37.187]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with ESMTP id k4FA3piu018511 for ; Mon, 15 May 2006 03:05:52 -0700 Received: by lucidpixels.com (Postfix, from userid 1001) id 55D4A15EA99; Mon, 15 May 2006 06:03:49 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by lucidpixels.com (Postfix) with ESMTP id 52E43100EC460; Mon, 15 May 2006 06:03:49 -0400 (EDT) Date: Mon, 15 May 2006 06:03:49 -0400 (EDT) From: Justin Piszcz X-X-Sender: jpiszcz@p34 To: Yogesh Bhanu cc: linux-xfs@oss.sgi.com Subject: Re: xfs_repair failing In-Reply-To: <4468263B.6080609@gsf.de> Message-ID: References: <4468263B.6080609@gsf.de> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-archive-position: 7732 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: jpiszcz@lucidpixels.com Precedence: bulk X-list: linux-xfs Content-Length: 1301 Lines: 36 Any errors in dmesg/logs? xfs_repair: read failed: Input/output error Looks like a bad disk? On Mon, 15 May 2006, Yogesh Bhanu wrote: > Hi all , > My SLES9 xfs file server has runinto some problems, > It started with storage acting flaky, though xfs had switched the filesystem > inquestion offline. After I have fixed the problem on the storage. I tried > running xfs_repair.(Before that I have mounted the file system in question). > The system ends up with the following error message. > > xfs_repair -v -n /dev/vg01/data01 > Phase 1 - find and verify superblock... > Phase 2 - using internal log > - scan filesystem freespace and inode maps... > - found root inode chunk > Phase 3 - for each AG... > - scan (but don't clear) agi unlinked lists... > - process known inodes and perform inode discovery... > - agno = 0 > imap claims a free inode 292381 is in use, would correct imap and clear inode > xfs_repair: read failed: Input/output error > cannot read inode 292448, disk block 146224, cnt 32 > > Any Ideas , to fix the error or should I head for backups. > > System configuration . SLES 9 with SP3 on AMD Opteron (246) Dual Processor > system with 8 GB RAM . Storage is connected (DS4300) by FC and uses LVM to > mount 3.5 TB volume. > > From owner-linux-xfs@oss.sgi.com Mon May 15 03:15:46 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Mon, 15 May 2006 03:15:49 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with SMTP id k4FADhBc019752 for ; Mon, 15 May 2006 03:15:45 -0700 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id UAA01575; Mon, 15 May 2006 20:13:35 +1000 Received: by chook.melbourne.sgi.com (Postfix, from userid 16302) id ECFFF4A588D2; Mon, 15 May 2006 20:13:33 +1000 (EST) To: linux-xfs@oss.sgi.com, sgi.bugs.xfs@engr.sgi.com Subject: PARTIAL TAKE 952342 - fix repair compilation Message-Id: <20060515101333.ECFFF4A588D2@chook.melbourne.sgi.com> Date: Mon, 15 May 2006 20:13:33 +1000 (EST) From: nathans@sgi.com (Nathan Scott) X-archive-position: 7733 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: nathans@sgi.com Precedence: bulk X-list: linux-xfs Content-Length: 860 Lines: 19 Fix compilation for xfs_repair after recent optimisations. __inline -> static inline, and remove debug-only copies of some routines. Date: Mon May 15 20:12:57 AEST 2006 Workarea: chook.melbourne.sgi.com:/build/nathans/xfs-cmds Inspected by: nathans The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/xfs-cmds/master-melb Modid: master-melb:xfs-cmds:25940a xfsprogs/repair/incore.h - 1.11 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/xfsprogs/repair/incore.h.diff?r1=text&tr1=1.11&r2=text&tr2=1.10&f=h xfsprogs/repair/avl.h - 1.9 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/xfsprogs/repair/avl.h.diff?r1=text&tr1=1.9&r2=text&tr2=1.8&f=h xfsprogs/repair/incore_ino.c - 1.13 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/xfsprogs/repair/incore_ino.c.diff?r1=text&tr1=1.13&r2=text&tr2=1.12&f=h From owner-linux-xfs@oss.sgi.com Mon May 15 04:02:36 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Mon, 15 May 2006 04:02:41 -0700 (PDT) Received: from mail.cohaesio.net (penguin.cohaesio.net [212.97.129.34]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with ESMTP id k4FB0X3J024556 for ; Mon, 15 May 2006 04:02:35 -0700 Received: from cohsrv1.cohaesio.com (cohsrv1.cohaesio.com [212.97.128.131]) by mail.cohaesio.net (Postfix) with ESMTP id 30229FB4CD; Mon, 15 May 2006 11:59:35 +0200 (CEST) Received: from homer.cohaesio.com ([212.97.128.136]) by cohsrv1.cohaesio.com with Microsoft SMTPSVC(6.0.3790.1830); Mon, 15 May 2006 11:59:47 +0200 From: Anders Saaby Organization: Cohaesio A/S To: Peter Broadwell Subject: Re: deep chmod|chown -R begin to start OOMkiller Date: Mon, 15 May 2006 11:59:34 +0200 User-Agent: KMail/1.9.1 Cc: linux-xfs@oss.sgi.com References: <4464E3B5.8020602@wink.com> In-Reply-To: <4464E3B5.8020602@wink.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200605151159.34802.as@cohaesio.com> X-OriginalArrivalTime: 15 May 2006 09:59:47.0578 (UTC) FILETIME=[4C885DA0:01C67806] X-archive-position: 7734 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: as@cohaesio.com Precedence: bulk X-list: linux-xfs Content-Length: 20283 Lines: 318 Hi, Do you have high CPU usage when running the chown? - Or just processes hanging i D-state? On Friday 12 May 2006 21:36, Peter Broadwell wrote: > I seem to be having the same problem as CHIKAMA Masaki was having in > December 7, 2005, namely "chown -R" running very slowly when hitting lots > of files (~17 million in my case). > > My machine doesn't have the same constraints that David pointed to as at > least part of the problem. > I have fast disks, and lots of memory (though perhaps still bad logfile > sizes) So I thought I'd feed into the discussion a bit, hoping for any > other ideas... > > I'm most interested in anything to (safely) speed this up on a live file > system as it has been running for nearly 24 hours so far... not hung or > corrupted anything as far as I can tell. > > Following is possibly interesting info from uname, /proc/meminfo, > /proc/slabinfo, ... (I don't have OOMkiller though): > > Thanks - > > ;;peter > > = = = = (start of info) = = = = > > peter@cl1 /data $ uname -sr > Linux 2.6.14-gentoo-r2 > peter@cl1 /data $ cat /proc/meminfo > MemTotal: 8058120 kB > MemFree: 2770704 kB > Buffers: 12 kB > Cached: 3412304 kB > SwapCached: 6860 kB > Active: 2914928 kB > Inactive: 1673712 kB > HighTotal: 0 kB > HighFree: 0 kB > LowTotal: 8058120 kB > LowFree: 2770704 kB > SwapTotal: 32129968 kB > SwapFree: 32114220 kB > Dirty: 16 kB > Writeback: 0 kB > Mapped: 1191804 kB > Slab: 666680 kB > CommitLimit: 36159028 kB > Committed_AS: 1313628 kB > PageTables: 4564 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 24420 kB > VmallocChunk: 34359713687 kB > HugePages_Total: 0 > HugePages_Free: 0 > Hugepagesize: 2048 kB > peter@cl1 /data $ cat /proc/slabinfo > slabinfo - version: 2.1 > # name > : tunables : slabdata > > rpc_buffers 8 8 2048 2 1 : tunables 24 12 8 > : slabdata 4 4 0 rpc_tasks 8 10 384 10 > 1 : tunables 54 27 8 : slabdata 1 1 0 > rpc_inode_cache 8 12 832 4 1 : tunables 54 27 8 > : slabdata 3 3 0 fib6_nodes 7 118 64 59 > 1 : tunables 120 60 8 : slabdata 2 2 0 > ip6_dst_cache 7 24 320 12 1 : tunables 54 27 8 > : slabdata 2 2 0 ndisc_cache 1 15 256 15 > 1 : tunables 120 60 8 : slabdata 1 1 0 RAWv6 > 4 4 896 4 1 : tunables 54 27 8 : slabdata > 1 1 0 UDPv6 1 4 896 4 1 : > tunables 54 27 8 : slabdata 1 1 0 tw_sock_TCPv6 > 0 0 192 20 1 : tunables 120 60 8 : slabdata 0 > 0 0 request_sock_TCPv6 0 0 128 30 1 : tunables > 120 60 8 : slabdata 0 0 0 TCPv6 6 > 10 1536 5 2 : tunables 24 12 8 : slabdata 2 2 > 0 UNIX 41 54 640 6 1 : tunables 54 27 > 8 : slabdata 9 9 0 tcp_bind_bucket 34 448 32 > 112 1 : tunables 120 60 8 : slabdata 4 4 0 > inet_peer_cache 0 0 64 59 1 : tunables 120 60 8 > : slabdata 0 0 0 ip_fib_alias 14 118 64 59 > 1 : tunables 120 60 8 : slabdata 2 2 0 ip_fib_hash > 14 118 64 59 1 : tunables 120 60 8 : slabdata > 2 2 0 ip_dst_cache 36 48 320 12 1 : > tunables 54 27 8 : slabdata 4 4 0 arp_cache > 8 30 256 15 1 : tunables 120 60 8 : slabdata 2 > 2 0 RAW 3 11 704 11 2 : tunables 54 > 27 8 : slabdata 1 1 0 UDP 16 20 > 768 5 1 : tunables 54 27 8 : slabdata 4 4 0 > tw_sock_TCP 23 40 192 20 1 : tunables 120 60 8 > : slabdata 2 2 0 request_sock_TCP 8 30 128 30 > 1 : tunables 120 60 8 : slabdata 1 1 0 TCP > 15 25 1408 5 2 : tunables 24 12 8 : slabdata > 5 5 0 uhci_urb_priv 0 0 88 44 1 : > tunables 120 60 8 : slabdata 0 0 0 scsi_cmd_cache > 29 35 512 7 1 : tunables 54 27 8 : slabdata 5 > 5 0 cfq_ioc_pool 0 0 96 40 1 : tunables 120 > 60 8 : slabdata 0 0 0 cfq_pool 0 0 > 160 24 1 : tunables 120 60 8 : slabdata 0 0 0 > crq_pool 0 0 88 44 1 : tunables 120 60 8 > : slabdata 0 0 0 deadline_drq 607 760 96 40 > 1 : tunables 120 60 8 : slabdata 18 19 480 as_arq > 0 0 112 34 1 : tunables 120 60 8 : slabdata > 0 0 0 mqueue_inode_cache 1 4 896 4 1 : > tunables 54 27 8 : slabdata 1 1 0 xfs_chashlist > 205900 385952 32 112 1 : tunables 120 60 8 : slabdata 3446 > 3446 0 xfs_ili 273754 273760 192 20 1 : tunables > 120 60 8 : slabdata 13688 13688 0 xfs_ifork 0 > 0 64 59 1 : tunables 120 60 8 : slabdata 0 0 > 0 xfs_efi_item 0 0 352 11 1 : tunables 54 27 > 8 : slabdata 0 0 0 xfs_efd_item 0 0 360 > 11 1 : tunables 54 27 8 : slabdata 0 0 0 > xfs_buf_item 1 21 184 21 1 : tunables 120 60 8 > : slabdata 1 1 0 xfs_dabuf 45 288 24 144 > 1 : tunables 120 60 8 : slabdata 2 2 0 xfs_da_state > 0 0 488 8 1 : tunables 54 27 8 : slabdata > 0 0 0 xfs_trans 186 351 872 9 2 : > tunables 54 27 8 : slabdata 32 39 81 xfs_inode > 275317 275317 528 7 1 : tunables 54 27 8 : slabdata 39331 > 39331 0 xfs_btree_cur 0 0 192 20 1 : tunables > 120 60 8 : slabdata 0 0 0 xfs_bmap_free_item 0 > 0 24 144 1 : tunables 120 60 8 : slabdata 0 0 > 0 xfs_buf 288 414 408 9 1 : tunables 54 27 > 8 : slabdata 45 46 216 xfs_ioend 32 54 144 > 27 1 : tunables 120 60 8 : slabdata 2 2 0 xfs_vnode > 275316 275316 632 6 1 : tunables 54 27 8 : slabdata > 45886 45886 0 ntfs_big_inode_cache 0 0 896 4 1 : > tunables 54 27 8 : slabdata 0 0 0 ntfs_inode_cache > 0 0 272 14 1 : tunables 54 27 8 : slabdata 0 > 0 0 ntfs_name_cache 0 0 512 8 1 : tunables 54 > 27 8 : slabdata 0 0 0 ntfs_attr_ctx_cache 0 0 > 64 59 1 : tunables 120 60 8 : slabdata 0 0 0 > ntfs_index_ctx_cache 0 0 128 30 1 : tunables 120 60 > 8 : slabdata 0 0 0 nfs_write_data 36 36 832 > 9 2 : tunables 54 27 8 : slabdata 4 4 0 > nfs_read_data 32 35 768 5 1 : tunables 54 27 8 > : slabdata 7 7 0 nfs_inode_cache 1 4 912 4 > 1 : tunables 54 27 8 : slabdata 1 1 0 nfs_page > 0 0 128 30 1 : tunables 120 60 8 : slabdata > 0 0 0 isofs_inode_cache 0 0 632 6 1 : > tunables 54 27 8 : slabdata 0 0 0 fat_inode_cache > 0 0 664 6 1 : tunables 54 27 8 : slabdata 0 > 0 0 fat_cache 0 0 32 112 1 : tunables 120 > 60 8 : slabdata 0 0 0 hugetlbfs_inode_cache 1 > 6 600 6 1 : tunables 54 27 8 : slabdata 1 1 > 0 ext2_inode_cache 0 0 744 5 1 : tunables 54 27 > 8 : slabdata 0 0 0 ext2_xattr 0 0 88 > 44 1 : tunables 120 60 8 : slabdata 0 0 0 > journal_handle 0 0 24 144 1 : tunables 120 60 8 > : slabdata 0 0 0 journal_head 0 0 96 40 > 1 : tunables 120 60 8 : slabdata 0 0 0 revoke_table > 0 0 16 202 1 : tunables 120 60 8 : slabdata > 0 0 0 revoke_record 0 0 32 112 1 : > tunables 120 60 8 : slabdata 0 0 0 ext3_inode_cache > 0 0 792 5 1 : tunables 54 27 8 : slabdata 0 > 0 0 ext3_xattr 0 0 88 44 1 : tunables 120 > 60 8 : slabdata 0 0 0 reiser_inode_cache 0 0 > 704 5 1 : tunables 54 27 8 : slabdata 0 0 0 > dnotify_cache 0 0 40 92 1 : tunables 120 60 8 > : slabdata 0 0 0 eventpoll_pwq 0 0 72 53 > 1 : tunables 120 60 8 : slabdata 0 0 0 > eventpoll_epi 0 0 192 20 1 : tunables 120 60 8 > : slabdata 0 0 0 inotify_event_cache 0 0 40 > 92 1 : tunables 120 60 8 : slabdata 0 0 0 > inotify_watch_cache 1 59 64 59 1 : tunables 120 60 > 8 : slabdata 1 1 0 kioctx 0 0 320 > 12 1 : tunables 54 27 8 : slabdata 0 0 0 kiocb > 0 0 256 15 1 : tunables 120 60 8 : slabdata > 0 0 0 fasync_cache 0 0 24 144 1 : > tunables 120 60 8 : slabdata 0 0 0 shmem_inode_cache > 840 850 792 5 1 : tunables 54 27 8 : slabdata 170 > 170 0 posix_timers_cache 0 0 168 23 1 : tunables > 120 60 8 : slabdata 0 0 0 uid_cache 9 > 118 64 59 1 : tunables 120 60 8 : slabdata 2 2 > 0 sgpool-128 32 32 4096 1 1 : tunables 24 12 > 8 : slabdata 32 32 0 sgpool-64 32 32 2048 > 2 1 : tunables 24 12 8 : slabdata 16 16 0 sgpool-32 > 32 32 1024 4 1 : tunables 54 27 8 : slabdata > 8 8 0 sgpool-16 45 48 512 8 1 : > tunables 54 27 8 : slabdata 6 6 0 sgpool-8 > 52 60 256 15 1 : tunables 120 60 8 : slabdata 4 > 4 0 blkdev_ioc 114 201 56 67 1 : tunables 120 > 60 8 : slabdata 3 3 0 blkdev_queue 31 44 > 712 11 2 : tunables 54 27 8 : slabdata 4 4 0 > blkdev_requests 311 630 264 15 1 : tunables 54 27 8 > : slabdata 40 42 216 biovec-(256) 256 256 4096 1 > 1 : tunables 24 12 8 : slabdata 256 256 0 biovec-128 > 256 256 2048 2 1 : tunables 24 12 8 : slabdata > 128 128 0 biovec-64 256 256 1024 4 1 : > tunables 54 27 8 : slabdata 64 64 0 biovec-16 > 285 285 256 15 1 : tunables 120 60 8 : slabdata 19 > 19 0 biovec-4 864 1652 64 59 1 : tunables 120 > 60 8 : slabdata 27 28 480 biovec-1 482 1616 > 16 202 1 : tunables 120 60 8 : slabdata 8 8 108 > bio 860 1500 128 30 1 : tunables 120 60 8 > : slabdata 50 50 480 file_lock_cache 6 24 160 24 > 1 : tunables 120 60 8 : slabdata 1 1 0 > sock_inode_cache 93 130 704 5 1 : tunables 54 27 8 > : slabdata 26 26 0 skbuff_fclone_cache 20 32 448 > 8 1 : tunables 54 27 8 : slabdata 3 4 0 > skbuff_head_cache 555 1035 256 15 1 : tunables 120 60 8 > : slabdata 69 69 0 acpi_operand 1127 1166 72 53 > 1 : tunables 120 60 8 : slabdata 22 22 0 > acpi_parse_ext 0 0 64 59 1 : tunables 120 60 8 > : slabdata 0 0 0 acpi_parse 0 0 40 92 > 1 : tunables 120 60 8 : slabdata 0 0 0 acpi_state > 0 0 88 44 1 : tunables 120 60 8 : slabdata > 0 0 0 proc_inode_cache 667 690 616 6 1 : > tunables 54 27 8 : slabdata 115 115 0 sigqueue > 32 46 168 23 1 : tunables 120 60 8 : slabdata 2 > 2 0 radix_tree_node 232625 303359 536 7 1 : tunables 54 > 27 8 : slabdata 43337 43337 0 bdev_cache 22 28 > 832 4 1 : tunables 54 27 8 : slabdata 7 7 0 > sysfs_dir_cache 2946 3021 72 53 1 : tunables 120 60 8 > : slabdata 57 57 0 mnt_cache 26 60 192 20 > 1 : tunables 120 60 8 : slabdata 3 3 0 inode_cache > 1080 1085 584 7 1 : tunables 54 27 8 : slabdata > 155 155 0 dentry_cache 252909 252909 224 17 1 : > tunables 120 60 8 : slabdata 14877 14877 0 filp > 883 1365 256 15 1 : tunables 120 60 8 : slabdata 91 > 91 0 names_cache 3 5 4096 1 1 : tunables 24 > 12 8 : slabdata 3 5 0 idr_layer_cache 77 84 > 528 7 1 : tunables 54 27 8 : slabdata 12 12 0 > buffer_head 52111 139612 88 44 1 : tunables 120 60 8 > : slabdata 3173 3173 0 mm_struct 67 77 1152 7 > 2 : tunables 24 12 8 : slabdata 11 11 0 > vm_area_struct 2672 2814 184 21 1 : tunables 120 60 8 > : slabdata 134 134 0 fs_cache 76 118 64 59 > 1 : tunables 120 60 8 : slabdata 2 2 0 files_cache > 66 72 896 4 1 : tunables 54 27 8 : slabdata > 18 18 0 signal_cache 109 120 640 6 1 : > tunables 54 27 8 : slabdata 20 20 0 sighand_cache > 103 108 2112 3 2 : tunables 24 12 8 : slabdata 36 > 36 0 task_struct 123 128 1728 4 2 : tunables 24 > 12 8 : slabdata 32 32 0 anon_vma 987 1440 > 24 144 1 : tunables 120 60 8 : slabdata 10 10 0 > shared_policy_node 0 0 56 67 1 : tunables 120 60 8 > : slabdata 0 0 0 numa_policy 39 404 16 202 > 1 : tunables 120 60 8 : slabdata 2 2 0 > size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 > : slabdata 0 0 0 size-131072 0 0 131072 1 > 32 : tunables 8 4 0 : slabdata 0 0 0 > size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 > : slabdata 0 0 0 size-65536 2 2 65536 1 > 16 : tunables 8 4 0 : slabdata 2 2 0 > size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 > : slabdata 0 0 0 size-32768 20 20 32768 1 > 8 : tunables 8 4 0 : slabdata 20 20 0 > size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 > : slabdata 0 0 0 size-16384 0 0 16384 1 > 4 : tunables 8 4 0 : slabdata 0 0 0 > size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 > : slabdata 0 0 0 size-8192 17 17 8192 1 > 2 : tunables 8 4 0 : slabdata 17 17 0 > size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 8 > : slabdata 0 0 0 size-4096 269 270 4096 1 > 1 : tunables 24 12 8 : slabdata 269 270 0 > size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 8 > : slabdata 0 0 0 size-2048 708 736 2048 2 > 1 : tunables 24 12 8 : slabdata 363 368 0 > size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 8 > : slabdata 0 0 0 size-1024 350 368 1024 4 > 1 : tunables 54 27 8 : slabdata 92 92 0 > size-512(DMA) 0 0 512 8 1 : tunables 54 27 8 > : slabdata 0 0 0 size-512 619 640 512 8 > 1 : tunables 54 27 8 : slabdata 80 80 0 > size-256(DMA) 0 0 256 15 1 : tunables 120 60 8 > : slabdata 0 0 0 size-256 82 105 256 15 > 1 : tunables 120 60 8 : slabdata 7 7 0 > size-192(DMA) 0 0 192 20 1 : tunables 120 60 8 > : slabdata 0 0 0 size-192 1560 2000 192 20 > 1 : tunables 120 60 8 : slabdata 100 100 0 > size-128(DMA) 0 0 128 30 1 : tunables 120 60 8 > : slabdata 0 0 0 size-64(DMA) 0 0 64 59 > 1 : tunables 120 60 8 : slabdata 0 0 0 size-64 > 2672 9027 64 59 1 : tunables 120 60 8 : slabdata > 153 153 0 size-32(DMA) 0 0 32 112 1 : > tunables 120 60 8 : slabdata 0 0 0 size-128 > 3807 4950 128 30 1 : tunables 120 60 8 : slabdata 165 > 165 300 size-32 703 784 32 112 1 : tunables 120 > 60 8 : slabdata 7 7 0 kmem_cache 155 155 > 704 5 1 : tunables 54 27 8 : slabdata 31 31 0 > peter@cl1 /data $ > > peter@cl1 /data $ df -k > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/md/0 22479104 13991500 8487604 63% / > udev 4029060 244 4028816 1% /dev > /dev/md/1 449447808 338816792 110631016 76% /data > none 4029060 0 4029060 0% /dev/shm > cl4:/data 451279232 112298760 338980472 25% /mnt/cl4-data > peter@cl1 /data $ xfs_info /data > meta-data=/dev/md1 isize=256 agcount=16, agsize=7024672 > blks = sectsz=512 > data = bsize=4096 blocks=112394720, imaxpct=25 > = sunit=16 swidth=64 blks, unwritten=1 > naming =version 2 bsize=4096 > log =internal bsize=4096 blocks=32768, version=1 > = sectsz=512 sunit=0 blks > realtime =none extsz=262144 blocks=0, rtextents=0 > peter@cl1 /data $ > > = = = = (end of info) = = = = -- Med venlig hilsen - Best regards - Meilleures salutations Anders Saaby Systems Engineer ------------------------------------------------ Cohaesio A/S - Maglebjergvej 5D - DK-2800 Lyngby Phone: +45 45 880 888 - Fax: +45 45 880 777 Mail: as@cohaesio.com - http://www.cohaesio.com ------------------------------------------------ From owner-linux-xfs@oss.sgi.com Mon May 15 04:18:15 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Mon, 15 May 2006 04:18:19 -0700 (PDT) Received: from mail.cohaesio.net (penguin.cohaesio.net [212.97.129.34]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with ESMTP id k4FBGEne026951 for ; Mon, 15 May 2006 04:18:15 -0700 Received: from cohsrv1.cohaesio.com (cohsrv1.cohaesio.com [212.97.128.131]) by mail.cohaesio.net (Postfix) with ESMTP id 2F3ABFB55C; Mon, 15 May 2006 11:54:20 +0200 (CEST) Received: from homer.cohaesio.com ([212.97.128.136]) by cohsrv1.cohaesio.com with Microsoft SMTPSVC(6.0.3790.1830); Mon, 15 May 2006 11:54:32 +0200 From: Anders Saaby Organization: Cohaesio A/S To: yogesh@gsf.de Subject: Re: xfs_repair failing Date: Mon, 15 May 2006 11:54:19 +0200 User-Agent: KMail/1.9.1 Cc: linux-xfs@oss.sgi.com References: <4468263B.6080609@gsf.de> In-Reply-To: <4468263B.6080609@gsf.de> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200605151154.19782.as@cohaesio.com> X-OriginalArrivalTime: 15 May 2006 09:54:32.0531 (UTC) FILETIME=[90C00230:01C67805] X-archive-position: 7735 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: as@cohaesio.com Precedence: bulk X-list: linux-xfs Content-Length: 491 Lines: 18 Hi, On Monday 15 May 2006 08:56, Yogesh Bhanu wrote: > xfs_repair: read failed: Input/output error Look like a disk error... - Did you check dmesg output? -- Med venlig hilsen - Best regards - Meilleures salutations Anders Saaby Systems Engineer ------------------------------------------------ Cohaesio A/S - Maglebjergvej 5D - DK-2800 Lyngby Phone: +45 45 880 888 - Fax: +45 45 880 777 Mail: as@cohaesio.com - http://www.cohaesio.com ------------------------------------------------ From owner-linux-xfs@oss.sgi.com Mon May 15 06:59:47 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Mon, 15 May 2006 06:59:49 -0700 (PDT) Received: from mta1.gsf.de (mta1.gsf.de [146.107.3.111]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with ESMTP id k4FDvjTm016181 for ; Mon, 15 May 2006 06:59:47 -0700 Received: from [127.0.0.1] (acouchis.gsf.de [146.107.217.183]) by mta1.gsf.de (Postfix) with ESMTP id 48FB25382E for ; Mon, 15 May 2006 15:57:44 +0200 (CEST) Message-ID: <446888D8.9050003@gsf.de> Date: Mon, 15 May 2006 15:57:44 +0200 From: Yogesh Bhanu Reply-To: yogesh@gsf.de Organization: IBI/MIPS User-Agent: Thunderbird 1.5.0.2 (X11/20060308) MIME-Version: 1.0 Cc: linux-xfs@oss.sgi.com Subject: Re: xfs_repair failing References: <4468263B.6080609@gsf.de> <200605151154.19782.as@cohaesio.com> In-Reply-To: <200605151154.19782.as@cohaesio.com> Content-Type: text/plain; charset=iso-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 7736 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: yogesh@gsf.de Precedence: bulk X-list: linux-xfs Content-Length: 483 Lines: 14 Hi , Thanks for the pointers, Justin, Anders Well after rebuilding the raid . I thought all was back to Normal, there were no more errors either in storage logs or in system logs(var/log/messages). But when I ran xfs_repair, there were again the similar scsi sense error messages in /var/log/messages and also in storage there were some error logs, As another disk was about to fail. I 'm rebuilding raid now so hopefully after this rebuild we are clean. Thanks yogesh From owner-linux-xfs@oss.sgi.com Mon May 15 07:41:51 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Mon, 15 May 2006 07:41:54 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with SMTP id k4FEdmHN020457 for ; Mon, 15 May 2006 07:41:50 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id XAA04883; Mon, 15 May 2006 23:29:42 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id k4FDTdh31492147; Mon, 15 May 2006 23:29:40 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id k4FDTaqR1490716; Mon, 15 May 2006 23:29:36 +1000 (AEST) Date: Mon, 15 May 2006 23:29:36 +1000 From: David Chinner To: Peter Broadwell Cc: linux-xfs@oss.sgi.com Subject: Re: deep chmod|chown -R begin to start OOMkiller Message-ID: <20060515132936.GN1331387@melbourne.sgi.com> References: <4464E3B5.8020602@wink.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4464E3B5.8020602@wink.com> User-Agent: Mutt/1.4.2.1i X-archive-position: 7737 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: linux-xfs Content-Length: 2573 Lines: 65 On Fri, May 12, 2006 at 12:36:21PM -0700, Peter Broadwell wrote: > I seem to be having the same problem as CHIKAMA Masaki was having in > December 7, 2005, > namely "chown -R" running very slowly when hitting lots of files (~17 > million in my case). The problem is different because there's no OOM killer being invoked, right? All you see is a slowdown? How much CPU time is the chmod consuming? > I'm most interested in anything to (safely) speed this up on a live file > system as it > has been running for nearly 24 hours so far... not hung or corrupted > anything as far > as I can tell. Well, doing a chmod on a single file requires an inode read, a log write, and eventually a inode write. > xfs_chashlist 205900 385952 32 112 1 : tunables 120 60 8 > xfs_ili 273754 273760 192 20 1 : tunables 120 60 8 > xfs_inode 275317 275317 528 7 1 : tunables 54 27 8 > xfs_vnode 275316 275316 632 6 1 : tunables 54 27 8 > dentry_cache 252909 252909 224 17 1 : tunables 120 60 8 From the inode to cluster ratio (xfs_inode/xfs_chashlist), you've got very sparse inode clusters, so each inode read and write will do a disk I/O. So, two I/Os per file chmod() plus a log write every few files plus directory reads. That makes it roughly 40 million I/Os to do your recursive chmod. On a single disk sustaining 200 I/Os per second, I'd expect it to take more than a couple of days to complete the recursive chmod. Your filesystem is going to be slow while this is going on as well. > peter@cl1 /data $ df -k > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/md/1 449447808 338816792 110631016 76% /data peter@cl1 /data $ xfs_info /data .... data = bsize=4096 blocks=112394720, imaxpct=25 = sunit=16 swidth=64 blks, unwritten=1 So a 64k stripe unit and 4-unit wide stripe. What RAID level are you using for your stripe? What's the spindle speed of the disks? log =internal bsize=4096 blocks=32768, version=1 With a 128MB version 1 log. If you were using version 2 logs, I'd suggest using a larger log buffer size to reduce the number of log writes. That would help quite a bit. Other than that, I can't think of much you could tune to help here. When you need to do that many I/Os, the only thing that speeds it up is to have lots of spindles... Cheers, Dave. -- Dave Chinner R&D Software Enginner SGI Australian Software Group From owner-linux-xfs@oss.sgi.com Mon May 15 17:02:04 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Mon, 15 May 2006 17:02:09 -0700 (PDT) Received: from g0.machinephasesystems.com (dsl092-191-029.sfo1.dsl.speakeasy.net [66.92.191.29]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with ESMTP id k4G0026o013754 for ; Mon, 15 May 2006 17:02:04 -0700 Received: from [192.168.1.67] (g.machinephasesystems.com [66.92.191.28]) by g0.machinephasesystems.com (8.13.6/8.13.6) with ESMTP id k4FLqo9s005602; Mon, 15 May 2006 14:52:50 -0700 Message-ID: <4468F967.6090202@wink.com> Date: Mon, 15 May 2006 14:57:59 -0700 From: Peter Broadwell User-Agent: Thunderbird 1.5.0.2 (X11/20060420) MIME-Version: 1.0 To: David Chinner CC: linux-xfs@oss.sgi.com Subject: Re: deep chmod|chown -R begin to start OOMkiller References: <4464E3B5.8020602@wink.com> <20060515132936.GN1331387@melbourne.sgi.com> In-Reply-To: <20060515132936.GN1331387@melbourne.sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 7738 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: peter@wink.com Precedence: bulk X-list: linux-xfs Content-Length: 3348 Lines: 83 David - Thanks first off for your reply as well. It was your old postings that inspired me to even ask my question... You're right that their is no OOMKiller on my system, so the problem is perhaps unrelated - I'm not sure how OOMKIller would make it different, but I don't know realy what OOMKiller does at a low level. As for load, the chown process would garner only 3-5% of the CPU according to top, but the load average would increase by 1 to 2, bringing it up to ~7. Trying to re-run a small subset of the chowns (to the same user) just now showed similar behavior, but when I ran it a second time it was *very* fast. ;-) As for version of the log, can I upgrade to version 2 on a running system? ;;peter David Chinner wrote: > On Fri, May 12, 2006 at 12:36:21PM -0700, Peter Broadwell wrote: >> I seem to be having the same problem as CHIKAMA Masaki was having in >> December 7, 2005, >> namely "chown -R" running very slowly when hitting lots of files (~17 >> million in my case). > > The problem is different because there's no OOM killer > being invoked, right? All you see is a slowdown? How much CPU > time is the chmod consuming? > >> I'm most interested in anything to (safely) speed this up on a live file >> system as it >> has been running for nearly 24 hours so far... not hung or corrupted >> anything as far >> as I can tell. > > Well, doing a chmod on a single file requires an inode read, > a log write, and eventually a inode write. > >> xfs_chashlist 205900 385952 32 112 1 : tunables 120 60 8 >> xfs_ili 273754 273760 192 20 1 : tunables 120 60 8 >> xfs_inode 275317 275317 528 7 1 : tunables 54 27 8 >> xfs_vnode 275316 275316 632 6 1 : tunables 54 27 8 >> dentry_cache 252909 252909 224 17 1 : tunables 120 60 8 > > From the inode to cluster ratio (xfs_inode/xfs_chashlist), you've > got very sparse inode clusters, so each inode read and write will do > a disk I/O. So, two I/Os per file chmod() plus a log write every few > files plus directory reads. That makes it roughly 40 million I/Os > to do your recursive chmod. > > On a single disk sustaining 200 I/Os per second, I'd expect it to > take more than a couple of days to complete the recursive chmod. Your > filesystem is going to be slow while this is going on as well. > >> peter@cl1 /data $ df -k >> Filesystem 1K-blocks Used Available Use% Mounted on >> /dev/md/1 449447808 338816792 110631016 76% /data > > peter@cl1 /data $ xfs_info /data > .... > data = bsize=4096 blocks=112394720, imaxpct=25 > = sunit=16 swidth=64 blks, unwritten=1 > > So a 64k stripe unit and 4-unit wide stripe. What RAID level are you > using for your stripe? What's the spindle speed of the disks? > > log =internal bsize=4096 blocks=32768, version=1 > > With a 128MB version 1 log. > > If you were using version 2 logs, I'd suggest using a larger > log buffer size to reduce the number of log writes. That would > help quite a bit. Other than that, I can't think of much you > could tune to help here. When you need to do that many I/Os, > the only thing that speeds it up is to have lots of spindles... > > Cheers, > > Dave. From owner-linux-xfs@oss.sgi.com Mon May 15 17:02:06 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Mon, 15 May 2006 17:02:12 -0700 (PDT) Received: from g0.machinephasesystems.com (dsl092-191-029.sfo1.dsl.speakeasy.net [66.92.191.29]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with ESMTP id k4G0026q013754 for ; Mon, 15 May 2006 17:02:05 -0700 Received: from [192.168.1.67] (g.machinephasesystems.com [66.92.191.28]) by g0.machinephasesystems.com (8.13.6/8.13.6) with ESMTP id k4FLPjR3005465; Mon, 15 May 2006 14:25:45 -0700 Message-ID: <4468F30E.3030405@wink.com> Date: Mon, 15 May 2006 14:30:54 -0700 From: Peter Broadwell User-Agent: Thunderbird 1.5.0.2 (X11/20060420) MIME-Version: 1.0 To: Anders Saaby CC: linux-xfs@oss.sgi.com Subject: Re: deep chmod|chown -R begin to start OOMkiller References: <4464E3B5.8020602@wink.com> <200605151159.34802.as@cohaesio.com> In-Reply-To: <200605151159.34802.as@cohaesio.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 7739 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: peter@wink.com Precedence: bulk X-list: linux-xfs Content-Length: 21125 Lines: 333 Anders - First, thanks for the reply. Yes, high CPU load, around 6-7 on a dual AMD Opteron system but much of it may be caused by other things running. Is there some tool that can isolate the load vs. D-state hang on a per-process instance? My chown did finally finish, some 63 hrs later for about 75 chowns/sec. This is running on system with 4 SATA 7200 rpm drives configured with software RAID 10 so it is essencialy 2 spindles and we are seeing about 1/3 of the theoretical maximum. Would of course be nice to do better, but perhaps is in bounds for reality? In looking around I did see a ioctl, XFS_IOC_FSBULKSTAT, that seemed like it might give a different approach to doing this, but looked like it was read only (and lots of work to get anything going with it...) Is this a worthwhile avenue to look at more deeply? ;;peter Anders Saaby wrote: > Hi, > > Do you have high CPU usage when running the chown? - Or just processes hanging > i D-state? > > On Friday 12 May 2006 21:36, Peter Broadwell wrote: >> I seem to be having the same problem as CHIKAMA Masaki was having in >> December 7, 2005, namely "chown -R" running very slowly when hitting lots >> of files (~17 million in my case). >> >> My machine doesn't have the same constraints that David pointed to as at >> least part of the problem. >> I have fast disks, and lots of memory (though perhaps still bad logfile >> sizes) So I thought I'd feed into the discussion a bit, hoping for any >> other ideas... >> >> I'm most interested in anything to (safely) speed this up on a live file >> system as it has been running for nearly 24 hours so far... not hung or >> corrupted anything as far as I can tell. >> >> Following is possibly interesting info from uname, /proc/meminfo, >> /proc/slabinfo, ... (I don't have OOMkiller though): >> >> Thanks - >> >> ;;peter >> >> = = = = (start of info) = = = = >> >> peter@cl1 /data $ uname -sr >> Linux 2.6.14-gentoo-r2 >> peter@cl1 /data $ cat /proc/meminfo >> MemTotal: 8058120 kB >> MemFree: 2770704 kB >> Buffers: 12 kB >> Cached: 3412304 kB >> SwapCached: 6860 kB >> Active: 2914928 kB >> Inactive: 1673712 kB >> HighTotal: 0 kB >> HighFree: 0 kB >> LowTotal: 8058120 kB >> LowFree: 2770704 kB >> SwapTotal: 32129968 kB >> SwapFree: 32114220 kB >> Dirty: 16 kB >> Writeback: 0 kB >> Mapped: 1191804 kB >> Slab: 666680 kB >> CommitLimit: 36159028 kB >> Committed_AS: 1313628 kB >> PageTables: 4564 kB >> VmallocTotal: 34359738367 kB >> VmallocUsed: 24420 kB >> VmallocChunk: 34359713687 kB >> HugePages_Total: 0 >> HugePages_Free: 0 >> Hugepagesize: 2048 kB >> peter@cl1 /data $ cat /proc/slabinfo >> slabinfo - version: 2.1 >> # name >> : tunables : slabdata >> >> rpc_buffers 8 8 2048 2 1 : tunables 24 12 8 >> : slabdata 4 4 0 rpc_tasks 8 10 384 10 >> 1 : tunables 54 27 8 : slabdata 1 1 0 >> rpc_inode_cache 8 12 832 4 1 : tunables 54 27 8 >> : slabdata 3 3 0 fib6_nodes 7 118 64 59 >> 1 : tunables 120 60 8 : slabdata 2 2 0 >> ip6_dst_cache 7 24 320 12 1 : tunables 54 27 8 >> : slabdata 2 2 0 ndisc_cache 1 15 256 15 >> 1 : tunables 120 60 8 : slabdata 1 1 0 RAWv6 >> 4 4 896 4 1 : tunables 54 27 8 : slabdata >> 1 1 0 UDPv6 1 4 896 4 1 : >> tunables 54 27 8 : slabdata 1 1 0 tw_sock_TCPv6 >> 0 0 192 20 1 : tunables 120 60 8 : slabdata 0 >> 0 0 request_sock_TCPv6 0 0 128 30 1 : tunables >> 120 60 8 : slabdata 0 0 0 TCPv6 6 >> 10 1536 5 2 : tunables 24 12 8 : slabdata 2 2 >> 0 UNIX 41 54 640 6 1 : tunables 54 27 >> 8 : slabdata 9 9 0 tcp_bind_bucket 34 448 32 >> 112 1 : tunables 120 60 8 : slabdata 4 4 0 >> inet_peer_cache 0 0 64 59 1 : tunables 120 60 8 >> : slabdata 0 0 0 ip_fib_alias 14 118 64 59 >> 1 : tunables 120 60 8 : slabdata 2 2 0 ip_fib_hash >> 14 118 64 59 1 : tunables 120 60 8 : slabdata >> 2 2 0 ip_dst_cache 36 48 320 12 1 : >> tunables 54 27 8 : slabdata 4 4 0 arp_cache >> 8 30 256 15 1 : tunables 120 60 8 : slabdata 2 >> 2 0 RAW 3 11 704 11 2 : tunables 54 >> 27 8 : slabdata 1 1 0 UDP 16 20 >> 768 5 1 : tunables 54 27 8 : slabdata 4 4 0 >> tw_sock_TCP 23 40 192 20 1 : tunables 120 60 8 >> : slabdata 2 2 0 request_sock_TCP 8 30 128 30 >> 1 : tunables 120 60 8 : slabdata 1 1 0 TCP >> 15 25 1408 5 2 : tunables 24 12 8 : slabdata >> 5 5 0 uhci_urb_priv 0 0 88 44 1 : >> tunables 120 60 8 : slabdata 0 0 0 scsi_cmd_cache >> 29 35 512 7 1 : tunables 54 27 8 : slabdata 5 >> 5 0 cfq_ioc_pool 0 0 96 40 1 : tunables 120 >> 60 8 : slabdata 0 0 0 cfq_pool 0 0 >> 160 24 1 : tunables 120 60 8 : slabdata 0 0 0 >> crq_pool 0 0 88 44 1 : tunables 120 60 8 >> : slabdata 0 0 0 deadline_drq 607 760 96 40 >> 1 : tunables 120 60 8 : slabdata 18 19 480 as_arq >> 0 0 112 34 1 : tunables 120 60 8 : slabdata >> 0 0 0 mqueue_inode_cache 1 4 896 4 1 : >> tunables 54 27 8 : slabdata 1 1 0 xfs_chashlist >> 205900 385952 32 112 1 : tunables 120 60 8 : slabdata 3446 >> 3446 0 xfs_ili 273754 273760 192 20 1 : tunables >> 120 60 8 : slabdata 13688 13688 0 xfs_ifork 0 >> 0 64 59 1 : tunables 120 60 8 : slabdata 0 0 >> 0 xfs_efi_item 0 0 352 11 1 : tunables 54 27 >> 8 : slabdata 0 0 0 xfs_efd_item 0 0 360 >> 11 1 : tunables 54 27 8 : slabdata 0 0 0 >> xfs_buf_item 1 21 184 21 1 : tunables 120 60 8 >> : slabdata 1 1 0 xfs_dabuf 45 288 24 144 >> 1 : tunables 120 60 8 : slabdata 2 2 0 xfs_da_state >> 0 0 488 8 1 : tunables 54 27 8 : slabdata >> 0 0 0 xfs_trans 186 351 872 9 2 : >> tunables 54 27 8 : slabdata 32 39 81 xfs_inode >> 275317 275317 528 7 1 : tunables 54 27 8 : slabdata 39331 >> 39331 0 xfs_btree_cur 0 0 192 20 1 : tunables >> 120 60 8 : slabdata 0 0 0 xfs_bmap_free_item 0 >> 0 24 144 1 : tunables 120 60 8 : slabdata 0 0 >> 0 xfs_buf 288 414 408 9 1 : tunables 54 27 >> 8 : slabdata 45 46 216 xfs_ioend 32 54 144 >> 27 1 : tunables 120 60 8 : slabdata 2 2 0 xfs_vnode >> 275316 275316 632 6 1 : tunables 54 27 8 : slabdata >> 45886 45886 0 ntfs_big_inode_cache 0 0 896 4 1 : >> tunables 54 27 8 : slabdata 0 0 0 ntfs_inode_cache >> 0 0 272 14 1 : tunables 54 27 8 : slabdata 0 >> 0 0 ntfs_name_cache 0 0 512 8 1 : tunables 54 >> 27 8 : slabdata 0 0 0 ntfs_attr_ctx_cache 0 0 >> 64 59 1 : tunables 120 60 8 : slabdata 0 0 0 >> ntfs_index_ctx_cache 0 0 128 30 1 : tunables 120 60 >> 8 : slabdata 0 0 0 nfs_write_data 36 36 832 >> 9 2 : tunables 54 27 8 : slabdata 4 4 0 >> nfs_read_data 32 35 768 5 1 : tunables 54 27 8 >> : slabdata 7 7 0 nfs_inode_cache 1 4 912 4 >> 1 : tunables 54 27 8 : slabdata 1 1 0 nfs_page >> 0 0 128 30 1 : tunables 120 60 8 : slabdata >> 0 0 0 isofs_inode_cache 0 0 632 6 1 : >> tunables 54 27 8 : slabdata 0 0 0 fat_inode_cache >> 0 0 664 6 1 : tunables 54 27 8 : slabdata 0 >> 0 0 fat_cache 0 0 32 112 1 : tunables 120 >> 60 8 : slabdata 0 0 0 hugetlbfs_inode_cache 1 >> 6 600 6 1 : tunables 54 27 8 : slabdata 1 1 >> 0 ext2_inode_cache 0 0 744 5 1 : tunables 54 27 >> 8 : slabdata 0 0 0 ext2_xattr 0 0 88 >> 44 1 : tunables 120 60 8 : slabdata 0 0 0 >> journal_handle 0 0 24 144 1 : tunables 120 60 8 >> : slabdata 0 0 0 journal_head 0 0 96 40 >> 1 : tunables 120 60 8 : slabdata 0 0 0 revoke_table >> 0 0 16 202 1 : tunables 120 60 8 : slabdata >> 0 0 0 revoke_record 0 0 32 112 1 : >> tunables 120 60 8 : slabdata 0 0 0 ext3_inode_cache >> 0 0 792 5 1 : tunables 54 27 8 : slabdata 0 >> 0 0 ext3_xattr 0 0 88 44 1 : tunables 120 >> 60 8 : slabdata 0 0 0 reiser_inode_cache 0 0 >> 704 5 1 : tunables 54 27 8 : slabdata 0 0 0 >> dnotify_cache 0 0 40 92 1 : tunables 120 60 8 >> : slabdata 0 0 0 eventpoll_pwq 0 0 72 53 >> 1 : tunables 120 60 8 : slabdata 0 0 0 >> eventpoll_epi 0 0 192 20 1 : tunables 120 60 8 >> : slabdata 0 0 0 inotify_event_cache 0 0 40 >> 92 1 : tunables 120 60 8 : slabdata 0 0 0 >> inotify_watch_cache 1 59 64 59 1 : tunables 120 60 >> 8 : slabdata 1 1 0 kioctx 0 0 320 >> 12 1 : tunables 54 27 8 : slabdata 0 0 0 kiocb >> 0 0 256 15 1 : tunables 120 60 8 : slabdata >> 0 0 0 fasync_cache 0 0 24 144 1 : >> tunables 120 60 8 : slabdata 0 0 0 shmem_inode_cache >> 840 850 792 5 1 : tunables 54 27 8 : slabdata 170 >> 170 0 posix_timers_cache 0 0 168 23 1 : tunables >> 120 60 8 : slabdata 0 0 0 uid_cache 9 >> 118 64 59 1 : tunables 120 60 8 : slabdata 2 2 >> 0 sgpool-128 32 32 4096 1 1 : tunables 24 12 >> 8 : slabdata 32 32 0 sgpool-64 32 32 2048 >> 2 1 : tunables 24 12 8 : slabdata 16 16 0 sgpool-32 >> 32 32 1024 4 1 : tunables 54 27 8 : slabdata >> 8 8 0 sgpool-16 45 48 512 8 1 : >> tunables 54 27 8 : slabdata 6 6 0 sgpool-8 >> 52 60 256 15 1 : tunables 120 60 8 : slabdata 4 >> 4 0 blkdev_ioc 114 201 56 67 1 : tunables 120 >> 60 8 : slabdata 3 3 0 blkdev_queue 31 44 >> 712 11 2 : tunables 54 27 8 : slabdata 4 4 0 >> blkdev_requests 311 630 264 15 1 : tunables 54 27 8 >> : slabdata 40 42 216 biovec-(256) 256 256 4096 1 >> 1 : tunables 24 12 8 : slabdata 256 256 0 biovec-128 >> 256 256 2048 2 1 : tunables 24 12 8 : slabdata >> 128 128 0 biovec-64 256 256 1024 4 1 : >> tunables 54 27 8 : slabdata 64 64 0 biovec-16 >> 285 285 256 15 1 : tunables 120 60 8 : slabdata 19 >> 19 0 biovec-4 864 1652 64 59 1 : tunables 120 >> 60 8 : slabdata 27 28 480 biovec-1 482 1616 >> 16 202 1 : tunables 120 60 8 : slabdata 8 8 108 >> bio 860 1500 128 30 1 : tunables 120 60 8 >> : slabdata 50 50 480 file_lock_cache 6 24 160 24 >> 1 : tunables 120 60 8 : slabdata 1 1 0 >> sock_inode_cache 93 130 704 5 1 : tunables 54 27 8 >> : slabdata 26 26 0 skbuff_fclone_cache 20 32 448 >> 8 1 : tunables 54 27 8 : slabdata 3 4 0 >> skbuff_head_cache 555 1035 256 15 1 : tunables 120 60 8 >> : slabdata 69 69 0 acpi_operand 1127 1166 72 53 >> 1 : tunables 120 60 8 : slabdata 22 22 0 >> acpi_parse_ext 0 0 64 59 1 : tunables 120 60 8 >> : slabdata 0 0 0 acpi_parse 0 0 40 92 >> 1 : tunables 120 60 8 : slabdata 0 0 0 acpi_state >> 0 0 88 44 1 : tunables 120 60 8 : slabdata >> 0 0 0 proc_inode_cache 667 690 616 6 1 : >> tunables 54 27 8 : slabdata 115 115 0 sigqueue >> 32 46 168 23 1 : tunables 120 60 8 : slabdata 2 >> 2 0 radix_tree_node 232625 303359 536 7 1 : tunables 54 >> 27 8 : slabdata 43337 43337 0 bdev_cache 22 28 >> 832 4 1 : tunables 54 27 8 : slabdata 7 7 0 >> sysfs_dir_cache 2946 3021 72 53 1 : tunables 120 60 8 >> : slabdata 57 57 0 mnt_cache 26 60 192 20 >> 1 : tunables 120 60 8 : slabdata 3 3 0 inode_cache >> 1080 1085 584 7 1 : tunables 54 27 8 : slabdata >> 155 155 0 dentry_cache 252909 252909 224 17 1 : >> tunables 120 60 8 : slabdata 14877 14877 0 filp >> 883 1365 256 15 1 : tunables 120 60 8 : slabdata 91 >> 91 0 names_cache 3 5 4096 1 1 : tunables 24 >> 12 8 : slabdata 3 5 0 idr_layer_cache 77 84 >> 528 7 1 : tunables 54 27 8 : slabdata 12 12 0 >> buffer_head 52111 139612 88 44 1 : tunables 120 60 8 >> : slabdata 3173 3173 0 mm_struct 67 77 1152 7 >> 2 : tunables 24 12 8 : slabdata 11 11 0 >> vm_area_struct 2672 2814 184 21 1 : tunables 120 60 8 >> : slabdata 134 134 0 fs_cache 76 118 64 59 >> 1 : tunables 120 60 8 : slabdata 2 2 0 files_cache >> 66 72 896 4 1 : tunables 54 27 8 : slabdata >> 18 18 0 signal_cache 109 120 640 6 1 : >> tunables 54 27 8 : slabdata 20 20 0 sighand_cache >> 103 108 2112 3 2 : tunables 24 12 8 : slabdata 36 >> 36 0 task_struct 123 128 1728 4 2 : tunables 24 >> 12 8 : slabdata 32 32 0 anon_vma 987 1440 >> 24 144 1 : tunables 120 60 8 : slabdata 10 10 0 >> shared_policy_node 0 0 56 67 1 : tunables 120 60 8 >> : slabdata 0 0 0 numa_policy 39 404 16 202 >> 1 : tunables 120 60 8 : slabdata 2 2 0 >> size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 >> : slabdata 0 0 0 size-131072 0 0 131072 1 >> 32 : tunables 8 4 0 : slabdata 0 0 0 >> size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 >> : slabdata 0 0 0 size-65536 2 2 65536 1 >> 16 : tunables 8 4 0 : slabdata 2 2 0 >> size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 >> : slabdata 0 0 0 size-32768 20 20 32768 1 >> 8 : tunables 8 4 0 : slabdata 20 20 0 >> size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 >> : slabdata 0 0 0 size-16384 0 0 16384 1 >> 4 : tunables 8 4 0 : slabdata 0 0 0 >> size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 >> : slabdata 0 0 0 size-8192 17 17 8192 1 >> 2 : tunables 8 4 0 : slabdata 17 17 0 >> size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 8 >> : slabdata 0 0 0 size-4096 269 270 4096 1 >> 1 : tunables 24 12 8 : slabdata 269 270 0 >> size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 8 >> : slabdata 0 0 0 size-2048 708 736 2048 2 >> 1 : tunables 24 12 8 : slabdata 363 368 0 >> size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 8 >> : slabdata 0 0 0 size-1024 350 368 1024 4 >> 1 : tunables 54 27 8 : slabdata 92 92 0 >> size-512(DMA) 0 0 512 8 1 : tunables 54 27 8 >> : slabdata 0 0 0 size-512 619 640 512 8 >> 1 : tunables 54 27 8 : slabdata 80 80 0 >> size-256(DMA) 0 0 256 15 1 : tunables 120 60 8 >> : slabdata 0 0 0 size-256 82 105 256 15 >> 1 : tunables 120 60 8 : slabdata 7 7 0 >> size-192(DMA) 0 0 192 20 1 : tunables 120 60 8 >> : slabdata 0 0 0 size-192 1560 2000 192 20 >> 1 : tunables 120 60 8 : slabdata 100 100 0 >> size-128(DMA) 0 0 128 30 1 : tunables 120 60 8 >> : slabdata 0 0 0 size-64(DMA) 0 0 64 59 >> 1 : tunables 120 60 8 : slabdata 0 0 0 size-64 >> 2672 9027 64 59 1 : tunables 120 60 8 : slabdata >> 153 153 0 size-32(DMA) 0 0 32 112 1 : >> tunables 120 60 8 : slabdata 0 0 0 size-128 >> 3807 4950 128 30 1 : tunables 120 60 8 : slabdata 165 >> 165 300 size-32 703 784 32 112 1 : tunables 120 >> 60 8 : slabdata 7 7 0 kmem_cache 155 155 >> 704 5 1 : tunables 54 27 8 : slabdata 31 31 0 >> peter@cl1 /data $ >> >> peter@cl1 /data $ df -k >> Filesystem 1K-blocks Used Available Use% Mounted on >> /dev/md/0 22479104 13991500 8487604 63% / >> udev 4029060 244 4028816 1% /dev >> /dev/md/1 449447808 338816792 110631016 76% /data >> none 4029060 0 4029060 0% /dev/shm >> cl4:/data 451279232 112298760 338980472 25% /mnt/cl4-data >> peter@cl1 /data $ xfs_info /data >> meta-data=/dev/md1 isize=256 agcount=16, agsize=7024672 >> blks = sectsz=512 >> data = bsize=4096 blocks=112394720, imaxpct=25 >> = sunit=16 swidth=64 blks, unwritten=1 >> naming =version 2 bsize=4096 >> log =internal bsize=4096 blocks=32768, version=1 >> = sectsz=512 sunit=0 blks >> realtime =none extsz=262144 blocks=0, rtextents=0 >> peter@cl1 /data $ >> >> = = = = (end of info) = = = = > From owner-linux-xfs@oss.sgi.com Mon May 15 20:14:16 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Mon, 15 May 2006 20:14:18 -0700 (PDT) Received: from g0.machinephasesystems.com (dsl092-191-029.sfo1.dsl.speakeasy.net [66.92.191.29]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with ESMTP id k4G3CDkT003130 for ; Mon, 15 May 2006 20:14:16 -0700 Received: from [192.168.1.67] (g.machinephasesystems.com [66.92.191.28]) by g0.machinephasesystems.com (8.13.6/8.13.6) with ESMTP id k4G36uYk007208; Mon, 15 May 2006 20:06:56 -0700 Message-ID: <44694306.9030407@wink.com> Date: Mon, 15 May 2006 20:12:06 -0700 From: Peter Broadwell User-Agent: Thunderbird 1.5.0.2 (X11/20060420) MIME-Version: 1.0 To: David Chinner CC: Anders Saaby , linux-xfs@oss.sgi.com Subject: Re: deep chmod|chown -R begin to start OOMkiller References: <4464E3B5.8020602@wink.com> <20060515132936.GN1331387@melbourne.sgi.com> <4468F967.6090202@wink.com> <4464E3B5.8020602@wink.com> <200605151159.34802.as@cohaesio.com> <4468F30E.3030405@wink.com> <20060516013408.GB1390195@melbourne.sgi.com> In-Reply-To: <20060516013408.GB1390195@melbourne.sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 7742 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: peter@wink.com Precedence: bulk X-list: linux-xfs Content-Length: 2416 Lines: 59 David Chinner wrote: > On Mon, May 15, 2006 at 02:30:54PM -0700, Peter Broadwell wrote: >> My chown did finally finish, some 63 hrs later for about 75 chowns/sec. >> This is running on system with 4 SATA 7200 rpm drives configured with >> software RAID 10 so it is essencialy 2 spindles and we are seeing >> about 1/3 of the theoretical maximum. > > If you call a SWAG a "theoretical maximum". All your results indicate > is that my guess was in the same ballpark as reality. Such a well informed SWAG is indication of good breeding ;-) >> In looking around I did see a ioctl, XFS_IOC_FSBULKSTAT, that seemed >> like it might give a different approach to doing this, but looked like it >> was read only (and lots of work to get anything going with it...) >> Is this a worthwhile avenue to look at more deeply? > > Read only, and does not follow any directory structure - it just reads > the inodes off disk in ascending block order.... Well, *if* I had to do this often I would think a write version of this ioctl might reduce by at least 1/3 the number of disk writes, no? It also seems funny that I could copy the whole disk in less time that it took me to chown files that are filling up less than 1/2 of it... Fortunately I don't expect to have to do this again, and if I do, I'll know it will be a long running process. Thanks again for your help in understanding what is probably happening. ;;peter > On Mon, May 15, 2006 at 02:57:59PM -0700, Peter Broadwell wrote: >> As for load, the chown process would garner only 3-5% of the CPU according >> to top, but the load average would increase by 1 to 2, bringing it up to ~7. > > A single process being I/O bound like this will contribute 1 to the > load average. > >> Trying to re-run a small subset of the chowns (to the same user) just now >> showed similar behavior, but when I ran it a second time it was *very* >> fast. ;-) > > My guess would be that the first time it ran it needed to read all > the inodes in off disk. The second time they were in cache, and the > subset probably fit in the log so the only I/O would be log I/O. > Hence the second run would be very fast.... > >> As for version of the log, can I upgrade to version 2 on a running system? > > I know there is on Irix (xfs_chver) which is a perl script wrapper for > xfs_db, but I'm not sure if there is an equivalent shipped on linux. > Nathan? > > Cheers, > > Dave. From owner-linux-xfs@oss.sgi.com Mon May 15 22:11:56 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Mon, 15 May 2006 22:11:58 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with SMTP id k4G59q7M021577 for ; Mon, 15 May 2006 22:11:55 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA27952; Tue, 16 May 2006 15:09:49 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id k4G59kh32087520; Tue, 16 May 2006 15:09:46 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id k4G59hwG2087302; Tue, 16 May 2006 15:09:43 +1000 (AEST) Date: Tue, 16 May 2006 15:09:43 +1000 From: David Chinner To: Peter Broadwell Cc: David Chinner , Anders Saaby , linux-xfs@oss.sgi.com Subject: Re: deep chmod|chown -R begin to start OOMkiller Message-ID: <20060516050943.GI1390195@melbourne.sgi.com> References: <4464E3B5.8020602@wink.com> <20060515132936.GN1331387@melbourne.sgi.com> <4468F967.6090202@wink.com> <4464E3B5.8020602@wink.com> <200605151159.34802.as@cohaesio.com> <4468F30E.3030405@wink.com> <20060516013408.GB1390195@melbourne.sgi.com> <44694306.9030407@wink.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <44694306.9030407@wink.com> User-Agent: Mutt/1.4.2.1i X-archive-position: 7744 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: linux-xfs Content-Length: 1498 Lines: 41 On Mon, May 15, 2006 at 08:12:06PM -0700, Peter Broadwell wrote: > David Chinner wrote: > >>In looking around I did see a ioctl, XFS_IOC_FSBULKSTAT, that seemed > >>like it might give a different approach to doing this, but looked like it > >>was read only (and lots of work to get anything going with it...) > >>Is this a worthwhile avenue to look at more deeply? > > > >Read only, and does not follow any directory structure - it just reads > >the inodes off disk in ascending block order.... > > Well, *if* I had to do this often I would think a write version of this > ioctl might reduce by at least 1/3 the number of disk writes, no? If you wanted to change every inode in the filesystem, then yes, it could be done this way (e.g. an inode cluster at a time). And the difference in I/Os would be more like an order of magnitude. However, a write version is not quite that simple because you still have to log all the changes you make..... > It also seems funny that I could copy the whole disk in less time that > it took me to chown files that are filling up less than 1/2 of it... Yep, that's the difference between doing large sequential I/Os for the disk copy and small, random I/Os for the chown... > Fortunately I don't expect to have to do this again, and if I do, > I'll know it will be a long running process. > > Thanks again for your help in understanding what is probably happening. np. Cheers, Dave. -- Dave Chinner R&D Software Enginner SGI Australian Software Group From owner-linux-xfs@oss.sgi.com Tue May 16 01:55:35 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Tue, 16 May 2006 01:55:39 -0700 (PDT) Received: from ccerelbas02.cce.hp.com (ccerelbas02.cce.hp.com [161.114.21.105]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with ESMTP id k4G8rYkc021477 for ; Tue, 16 May 2006 01:55:35 -0700 Received: from mailrelay01.cce.cpqcorp.net (relay.dec.com [16.47.68.171]) by ccerelbas02.cce.hp.com (Postfix) with ESMTP id 1998B34020 for ; Tue, 16 May 2006 01:21:40 -0500 (CDT) Received: from flyingAngel.upjs.sk (alienangel.emea.hpqcorp.net [16.55.206.67]) by mailrelay01.cce.cpqcorp.net (Postfix) with ESMTP id 917771633 for ; Tue, 16 May 2006 01:21:40 -0500 (CDT) Received: by flyingAngel.upjs.sk (Postfix, from userid 500) id 71E661DE9A5; Tue, 16 May 2006 08:21:28 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by flyingAngel.upjs.sk (Postfix) with ESMTP id 6BBB798 for ; Tue, 16 May 2006 08:21:28 +0200 (CEST) Date: Tue, 16 May 2006 08:21:27 +0200 (CEST) From: Jan Derfinak To: linux-xfs@oss.sgi.com Subject: CVS access Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 7745 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: ja@mail.upjs.sk Precedence: bulk X-list: linux-xfs Content-Length: 432 Lines: 15 Hello. Last few days I have problem to update kernel from SGI cvs tree. I like to ask what is wrong? Thanks. jan xfs/linux-2.6-xfs %49> cvs -z5 update -dP cvs update: Updating . cvs update: failed to create lock directory for /cvs/linux-2.6-xfs' (/var/lock/cvs/linux-2.6-xfs/#cvs.lock): Permission denied cvs update: failed to obtain dir lock in repository /cvs/linux-2.6-xfs' cvs [update aborted]: read lock failed - giving up From owner-linux-xfs@oss.sgi.com Wed May 17 15:15:33 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Wed, 17 May 2006 15:15:39 -0700 (PDT) Received: from coraid.com (ns1.coraid.com [65.14.39.133]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with ESMTP id k4HMDV9u005552 for ; Wed, 17 May 2006 15:15:32 -0700 Received: from coraid.com ([205.185.197.207]) by coraid.com; Wed May 17 15:35:26 EDT 2006 Date: Wed, 17 May 2006 15:36:06 -0400 From: "Ed L. Cashin" To: linux-xfs@oss.sgi.com Subject: xfs_repair on large fs: out of memory Message-ID: <20060517193606.GO32378@coraid.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.11+cvs20060126 X-archive-position: 7750 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: ecashin@coraid.com Precedence: bulk X-list: linux-xfs Content-Length: 1648 Lines: 42 Hi. In trying to run xfs_repair on a 13 TB filesystem, the program either runs out of memory ... host02:~# sysctl vm.overcommit_memory=2 vm.overcommit_memory = 2 host02:~# /opt/xfsprogs-2.7.11/sbin/xfs_repair -v -n /dev/mapper/vg-lv Phase 1 - find and verify superblock... fatal error -- couldn't allocate block map, size = 106834464 host02:~# ... or just uses up all the system's memory and is killed if I use a more liberal vm.overcommit_memory setting. An strace shows that mmap is being called repeatedly. ... mmap(NULL, 53420032, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b2201d65000 mmap(NULL, 53420032, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b2205057000 mmap(NULL, 53420032, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b2208349000 mmap(NULL, 53420032, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b220b63b000 +++ killed by SIGKILL +++ Process 3146 detached This machine has a gigabyte of memory. It's running an x86_64 2.6.16.13 kernel. I think that the XFS log is not OK because on mount attempts the messages below appear in the logs. kernel: XFS mounting filesystem dm-0 kernel: Starting XFS recovery on filesystem: dm-0 (logdev: internal) kernel: XFS: xlog_recover_process_data: bad clientid kernel: XFS: log mount/recovery failed: error 5 kernel: XFS: log mount failed Based on the "bad clientid" message, I think that "xfs_repair -L" would be the next logical step, but it seems like there must be a bug in xfs_repair if it runs out of memory instead of telling me that. -- Ed L Cashin From owner-linux-xfs@oss.sgi.com Wed May 17 15:26:11 2006 Received: with ECARTIS (v1.0.0; list linux-xfs); Wed, 17 May 2006 15:26:14 -0700 (PDT) Received: from smtp101.sbc.mail.mud.yahoo.com (smtp101.sbc.mail.mud.yahoo.com [68.142.198.200]) by oss.sgi.com (8.13.6/8.12.10/SuSE Linux 0.7) with SMTP id k4HMO8Eh006862 for ; Wed, 17 May 2006 15:26:11 -0700 Received: (qmail 19718 invoked from network); 17 May 2006 22:24:01 -0000 Received: from unknown (HELO stupidest.org) (cwedgwood@sbcglobal.net@67.164.15.140 with login) by smtp101.sbc.mail.mud.yahoo.com with SMTP; 17 May 2006 22:24:01 -0000 Received: by taniwha.stupidest.org (Postfix, from userid 38689) id 501CB51FAC0; Wed, 17 May 2006 15:23:53 -0700 (PDT) Date: Wed, 17 May 2006 15:23:53 -0700 From: Chris Wedgwood To: "Ed L. Cashin" Cc: linux-xfs@oss.sgi.com Subject: Re: xfs_repair on large fs: out of memory Message-ID: <20060517222353.GA32668@taniwha.stupidest.org> References: <20060517193606.GO32378@coraid.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20060517193606.GO32378@coraid.com> X-archive-position: 7751 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sg