From owner-xfs@oss.sgi.com Thu Nov 1 03:16:07 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 01 Nov 2007 03:16:10 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA1AG6bR006267 for ; Thu, 1 Nov 2007 03:16:07 -0700 Received: from hch by pentafluge.infradead.org with local (Exim 4.63 #1 (Red Hat Linux)) id 1InWqS-0005F4-D3; Thu, 01 Nov 2007 10:00:12 +0000 Date: Thu, 1 Nov 2007 10:00:12 +0000 From: Christoph Hellwig To: Lachlan McIlroy Cc: xfs-dev , xfs-oss Subject: Re: [PATCH] Turn off XBF_READ_AHEAD in io completion Message-ID: <20071101100012.GA20065@infradead.org> References: <47296FF7.8080607@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <47296FF7.8080607@sgi.com> User-Agent: Mutt/1.4.2.3i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-Virus-Scanned: ClamAV 0.91.2/4655/Thu Nov 1 00:41:48 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13514 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs On Thu, Nov 01, 2007 at 05:19:35PM +1100, Lachlan McIlroy wrote: > Read-ahead of an inode cluster will set XBF_READ_AHEAD in the buffer. > If we don't remove the flag it will still be set when we flush the > buffer back to disk. Not sure if leaving this flag set causes any > serious problems but it does trigger an assert. It might be better if such temporary flags never actually make it to bp->b_flags. Just pass down a flags variable all the way to _xfs_buf_ioapply and keep the flags just for this I/O separate from those that are permanent and in bp->b_flags. From owner-xfs@oss.sgi.com Thu Nov 1 11:58:26 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 01 Nov 2007 11:58:32 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.0-r574664 Received: from postfix2-g20.free.fr (postfix2-g20.free.fr [212.27.60.43]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA1IwO21023920 for ; Thu, 1 Nov 2007 11:58:25 -0700 Received: from smtp7-g19.free.fr (smtp7-g19.free.fr [212.27.42.64]) by postfix2-g20.free.fr (Postfix) with ESMTP id E92D11D7F582 for ; Thu, 1 Nov 2007 17:57:30 +0100 (CET) Received: from smtp7-g19.free.fr (localhost [127.0.0.1]) by smtp7-g19.free.fr (Postfix) with ESMTP id 246E33227EA; Thu, 1 Nov 2007 19:58:23 +0100 (CET) Received: from galadriel.home (pla78-1-82-235-234-79.fbx.proxad.net [82.235.234.79]) by smtp7-g19.free.fr (Postfix) with ESMTP id 06C5932283B; Thu, 1 Nov 2007 19:58:21 +0100 (CET) Date: Thu, 1 Nov 2007 19:58:12 +0100 From: Emmanuel Florac To: Joshua Baker-LePain Cc: "paul.lkw" , xfs@oss.sgi.com Subject: Re: 2.6TB Storage Size Problem Message-ID: <20071101195812.7355aa92@galadriel.home> In-Reply-To: References: <13501909.post@talk.nabble.com> Organization: Intellique X-Mailer: Claws Mail 2.9.1 (GTK+ 2.8.20; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Scanned: ClamAV 0.91.2/4659/Thu Nov 1 09:24:40 2007 on oss.sgi.com X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id lA1IwQ21023932 X-archive-position: 13515 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: eflorac@intellique.com Precedence: bulk X-list: xfs Le Tue, 30 Oct 2007 23:30:04 -0400 (EDT) vous écriviez: > 3) You can't boot from such a device (as neither grub nor lilo > support gpt disklabels). lilo does support booting from gpt on Debian since Sarge at least. I'd be surprised if the CentOS build doesn't. -- -------------------------------------------------- Emmanuel Florac www.intellique.com -------------------------------------------------- From owner-xfs@oss.sgi.com Thu Nov 1 13:22:16 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 01 Nov 2007 13:22:19 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: *** X-Spam-Status: No, score=3.0 required=5.0 tests=BAYES_50,HTML_MESSAGE autolearn=no version=3.3.0-r574664 Received: from SVITS26.main.ad.rit.edu (svits26.main.ad.rit.edu [129.21.18.136]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA1KMFsq003297 for ; Thu, 1 Nov 2007 13:22:16 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5 MIME-Version: 1.0 Subject: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c Date: Thu, 1 Nov 2007 16:06:35 -0400 Message-ID: <06CCEA2EB1B80A4A937ED59005FA855101AED1BE@svits26.main.ad.rit.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c Thread-Index: AcgcwrSV+pTfvWMBRwuhoBrzLC8Qsg== From: "Jay Sullivan" To: X-Virus-Scanned: ClamAV 0.91.2/4659/Thu Nov 1 09:24:40 2007 on oss.sgi.com X-Virus-Status: Clean Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: 7bit Content-length: 3008 X-archive-position: 13516 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jpspgd@rit.edu Precedence: bulk X-list: xfs I have an XFS filesystem that has had the following happen twice in 3 months, both times with an impossibly large block number was requested. Unfortunately my logs don't go back far enough for me to know if it was the _exact_ same block both times... I'm running xfsprogs 2.8.21. Excerpt from syslog (hostname obfuscated to 'servername' to protect the innocent): ## Nov 1 14:06:32 servername dm-1: rw=0, want=39943195856896, limit=7759462400 Nov 1 14:06:32 servername I/O error in filesystem ("dm-1") meta-data dev dm-1 block 0x245400000ff8 ("xfs_trans_read_buf") error 5 buf count 4096 Nov 1 14:06:32 servername xfs_force_shutdown(dm-1,0x1) called from line 415 of file fs/xfs/xfs_trans_buf.c. Return address = 0xc02baa25 Nov 1 14:06:32 servername Filesystem "dm-1": I/O Error Detected. Shutting down filesystem: dm-1 Nov 1 14:06:32 servername Please umount the filesystem, and rectify the problem(s) ### I ran xfs_repair -L on the FS and it could be mounted again, but how long until it happens a third time? What concerns me is that this is a FS smaller than 4TB and 39943195856896 (or 0x245400000ff8) seems like a block that I would only have if my FS was muuuuuch larger. The following is output from some pertinent programs: ### servername ~ # xfs_info /mnt/san meta-data=/dev/servername-sanvg01/servername-sanlv01 isize=256 agcount=5, agsize=203161600 blks = sectsz=512 attr=2 data = bsize=4096 blocks=969932800, imaxpct=25 = sunit=0 swidth=0 blks, unwritten=1 naming =version 2 bsize=4096 log =internal bsize=4096 blocks=32768, version=1 = sectsz=512 sunit=0 blks, lazy-count=0 realtime =none extsz=4096 blocks=0, rtextents=0 servername ~ # mount /dev/sda3 on / type ext3 (rw,noatime,acl) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw,nosuid,nodev,noexec) udev on /dev type tmpfs (rw,nosuid) devpts on /dev/pts type devpts (rw,nosuid,noexec) shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev) usbfs on /proc/bus/usb type usbfs (rw,noexec,nosuid,devmode=0664,devgid=85) binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,noexec,nosuid,nodev) nfsd on /proc/fs/nfsd type nfsd (rw) /dev/mapper/servername--sanvg01-servername--sanlv01 on /mnt/san type xfs (rw,noatime,nodiratime,logbufs=8,attr2) /dev/mapper/servername--sanvg01-servername--rendersharelv01 on /mnt/san/rendershare type xfs (rw,noatime,nodiratime,logbufs=8,attr2) rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) servername ~ # uname -a Linux servername 2.6.20-gentoo-r8 #7 SMP Fri Jun 29 14:46:02 EDT 2007 i686 Intel(R) Xeon(TM) CPU 3.20GHz GenuineIntel GNU/Linux ### Does anyone know if this points to a bad block on a disk or if something is corrupted and can be fixed with some expert knowledge of xfs_db? ~Jay [[HTML alternate version deleted]] From owner-xfs@oss.sgi.com Thu Nov 1 15:47:49 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 01 Nov 2007 15:47:53 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA1MlkaL017277 for ; Thu, 1 Nov 2007 15:47:48 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA29284; Fri, 2 Nov 2007 09:47:46 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lA1MljdD90344303; Fri, 2 Nov 2007 09:47:45 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lA1MliWs89544045; Fri, 2 Nov 2007 09:47:44 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Fri, 2 Nov 2007 09:47:44 +1100 From: David Chinner To: Lachlan McIlroy Cc: David Chinner , xfs@oss.sgi.com, xfs-dev@sgi.com Subject: Re: [PATCH] Implement fallocate Message-ID: <20071101224744.GE995458@sgi.com> References: <20071029233841.GT995458@sgi.com> <472928C1.5080707@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <472928C1.5080707@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4659/Thu Nov 1 09:24:40 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13517 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Thu, Nov 01, 2007 at 12:15:45PM +1100, Lachlan McIlroy wrote: > >+ xfs_ilock(XFS_I(inode), XFS_IOLOCK_EXCL); > >+ error = xfs_change_file_space(XFS_I(inode), XFS_IOC_RESVSP, &bf, > >+ 0, NULL, ATTR_NOLOCK); > >+ if (!error && !(mode & FALLOC_FL_KEEP_SIZE) && > >+ offset + len > i_size_read(inode)) > >+ new_size = offset + len; > >+ > >+ /* Change file size if needed */ > >+ if (new_size) { > >+ bhv_vattr_t va; > >+ > >+ va.va_mask = XFS_AT_SIZE; > >+ va.va_size = new_size; > >+ error = xfs_setattr(XFS_I(inode), &va, ATTR_NOLOCK, NULL); > >+ } > > Is it necessary to call xfs_setattr() here? Could we just do an explicit > call to xfs_zero_eof(), set the new size, set i_update_core/size and mark > the inode dirty? Hmmm, then again, that approach wouldn't be as clean as > above. And it also violates the atomicity that posix_fallocate is supposed to provide. i.e. if it returns success, the change of file size must be permanent. i.e. the change of size needs to be recorded in a transaction. Hence we need to call xfs_setattr.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Thu Nov 1 15:54:59 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 01 Nov 2007 15:55:01 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA1MssbB018151 for ; Thu, 1 Nov 2007 15:54:57 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA29422; Fri, 2 Nov 2007 09:54:54 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lA1MssdD90412464; Fri, 2 Nov 2007 09:54:54 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lA1Msr2Y90572624; Fri, 2 Nov 2007 09:54:53 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Fri, 2 Nov 2007 09:54:53 +1100 From: David Chinner To: Lachlan McIlroy Cc: David Chinner , xfs@oss.sgi.com, xfs-dev@sgi.com Subject: Re: [PATCH] fix transaction overrun during writeback Message-ID: <20071101225453.GF995458@sgi.com> References: <20071029234010.GU995458@sgi.com> <4729304A.2010202@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4729304A.2010202@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4659/Thu Nov 1 09:24:40 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13518 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Thu, Nov 01, 2007 at 12:47:54PM +1100, Lachlan McIlroy wrote: > Looks good Dave. Since this is a writeback path is there some way > we can tell xfs_bmapi() that it should not convert anything but > delayed allocs and have it assert/error out if it tries to - not > that it will now with this change but just as defensive measure? I looked at that, but it's not straight forward. In this case we are simply asking for an allocation, assuming the range we ask for is already delalloc. however, the same call could be used to allocate the space if the transaction reservation took into account the space needing to be allocated. So there's not really any simple way to deal with this, esp. as it is valid to allocate both delalloc and unreserved space in the one xfs_bmapi() call as long as you do the right thing with the transaction reservation... We really need to fix the way xfs_iomap works so we don't have the race condition in the first place.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Thu Nov 1 17:30:23 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 01 Nov 2007 17:30:26 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA20UI7e001094 for ; Thu, 1 Nov 2007 17:30:21 -0700 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA02245; Fri, 2 Nov 2007 11:30:17 +1100 Received: by chook.melbourne.sgi.com (Postfix, from userid 16346) id 6D3EE58C38F7; Fri, 2 Nov 2007 11:30:17 +1100 (EST) To: sgi.bugs.xfs@engr.sgi.com Cc: xfs@oss.sgi.com Subject: PARTIAL TAKE 971186 - Clean up bitops Message-Id: <20071102003017.6D3EE58C38F7@chook.melbourne.sgi.com> Date: Fri, 2 Nov 2007 11:30:17 +1100 (EST) From: dgc@sgi.com (David Chinner) X-Virus-Scanned: ClamAV 0.91.2/4659/Thu Nov 1 09:24:40 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13519 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Use the generic bitops rather than implementing them ourselves. Patch inspired by Andi Kleen. Date: Fri Nov 2 11:29:35 AEDT 2007 Workarea: chook.melbourne.sgi.com:/build/dgc/isms/2.6.x-xfs Inspected by: hch@infradead.org The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb Modid: xfs-linux-melb:xfs-kern:30000a fs/xfs/xfs_bit.h - 1.21 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_bit.h.diff?r1=text&tr1=1.21&r2=text&tr2=1.20&f=h - wrap xfs bitop functions around generic implementations. fs/xfs/xfs_bit.c - 1.32 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_bit.c.diff?r1=text&tr1=1.32&r2=text&tr2=1.31&f=h - Remove implementation of generic bitops. fs/xfs/xfs_rtalloc.c - 1.108 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_rtalloc.c.diff?r1=text&tr1=1.108&r2=text&tr2=1.107&f=h - remove implementation of generic bitops. From owner-xfs@oss.sgi.com Thu Nov 1 18:11:26 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 01 Nov 2007 18:11:31 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA21BL24006076 for ; Thu, 1 Nov 2007 18:11:24 -0700 Received: from timothy-shimmins-power-mac-g5.local (boing.melbourne.sgi.com [134.14.55.141]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA03042; Fri, 2 Nov 2007 12:11:18 +1100 Message-ID: <472A7940.5070800@sgi.com> Date: Fri, 02 Nov 2007 12:11:28 +1100 From: Timothy Shimmin User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Roger Willcocks CC: xfs@oss.sgi.com Subject: Re: bug: truncate to zero + setuid References: <47249E7A.7060709@filmlight.ltd.uk> <47252F62.6030503@sgi.com> <47262CD0.5010708@filmlight.ltd.uk> <4726ADAE.9070206@sgi.com> <472769A1.5090605@filmlight.ltd.uk> In-Reply-To: <472769A1.5090605@filmlight.ltd.uk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4659/Thu Nov 1 09:24:40 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13520 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: tes@sgi.com Precedence: bulk X-list: xfs Hi Roger, Roger Willcocks wrote: > Timothy Shimmin wrote: >> I presume it was done where it was done so that the inode was locked >> and we >> were under the XFS_AT_SIZE predicate. >> >> I was just thinking of something like... >> but I'm probably missing something. >> >> Index: 2.6.x-xfs/fs/xfs/xfs_vnodeops.c >> =================================================================== >> --- 2.6.x-xfs.orig/fs/xfs/xfs_vnodeops.c 2007-10-12 >> 16:06:15.000000000 +1000 >> +++ 2.6.x-xfs/fs/xfs/xfs_vnodeops.c 2007-10-30 14:59:46.418837757 >> +1100 >> @@ -304,6 +304,24 @@ >> } >> >> /* >> + * Short circuit the truncate case for zero length files. >> + * If more mask bits are set, then just remove the SIZE one >> + * and keep going. >> + */ >> + if (mask & XFS_AT_SIZE) { >> + xfs_ilock(ip, XFS_ILOCK_SHARED); >> + if ((vap->va_size == 0) && (ip->i_size == 0) && >> (ip->i_d.di_nextents == 0)) { >> + if (mask & ~XFS_AT_SIZE) { >> + mask &= ~XFS_AT_SIZE; >> + } else { >> + xfs_iunlock(ip, XFS_ILOCK_SHARED); >> + return 0; >> + } >> + } >> + xfs_iunlock(ip, XFS_ILOCK_SHARED); >> + } >> + >> + /* >> * For the other attributes, we acquire the inode lock and >> * first do an error checking pass. >> */ >> @@ -451,17 +469,6 @@ >> * Truncate file. Must have write permission and not be a >> directory. >> */ >> if (mask & XFS_AT_SIZE) { >> - /* Short circuit the truncate case for zero length >> files */ >> - if ((vap->va_size == 0) && >> - (ip->i_size == 0) && (ip->i_d.di_nextents == 0)) { >> - xfs_iunlock(ip, XFS_ILOCK_EXCL); >> - lock_flags &= ~XFS_ILOCK_EXCL; >> - if (mask & XFS_AT_CTIME) >> - xfs_ichgtime(ip, XFS_ICHGTIME_MOD | >> XFS_ICHGTIME_CHG); >> - code = 0; >> - goto error_return; >> - } >> - >> if (VN_ISDIR(vp)) { >> code = XFS_ERROR(EISDIR); >> goto error_return; > > This misses setting the access and changed times which still need to be > touched even if the file's already zero bytes. How about this: > (noting that open.c/do_truncate uses XFS_AT_SIZE | XFS_AT_CTIME) > Well, if XFS_AT_CTIME was set then my patch wouldn't return straight away and would continue processing the mask fields. And I was presuming that the times would be set doing this. However, it doesn't look quite so simple and consistent: * going down the normal (non-short-circuit) path for AT_SIZE, it doesn't test for CTIME but rather just does: /* * Have to do this even if the file's size doesn't change. */ timeflags |= XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG; And yet our short-circuit case only does it if AT_CTIME is set. Doesn't look consistent to me. * So what in the vfs would call it... ----------------------- open(O_TRUNC)... int may_open(struct nameidata *nd, int acc_mode, int flag) { if (flag & O_TRUNC) { ... error = do_truncate(dentry, 0, ATTR_MTIME|ATTR_CTIME, NULL); ------------------------ static long do_sys_ftruncate(unsigned int fd, loff_t length, int small) { if (!error) error = do_truncate(dentry, length, ATTR_MTIME|ATTR_CTIME, file); ------------------------ This is NOR the case for * do_sys_truncate() and * do_coredump(), however, which send in zero for those bits. Not to mention other ways to get to these calls. Anyway, open(O_TRUNC) will be a good candidate for the optimization by the looks of it as it always trunc's to zero. So in the 2 truncate cases, it is setting MTIME and CTIME. And how is MTIME handled in the xfs code... /* * Change file access or modified times. */ if (mask & (XFS_AT_ATIME|XFS_AT_MTIME)) { if (mask & XFS_AT_ATIME) { ip->i_d.di_atime.t_sec = vap->va_atime.tv_sec; ip->i_d.di_atime.t_nsec = vap->va_atime.tv_nsec; ip->i_update_core = 1; timeflags &= ~XFS_ICHGTIME_ACC; } if (mask & XFS_AT_MTIME) { ip->i_d.di_mtime.t_sec = vap->va_mtime.tv_sec; ip->i_d.di_mtime.t_nsec = vap->va_mtime.tv_nsec; timeflags &= ~XFS_ICHGTIME_MOD; timeflags |= XFS_ICHGTIME_CHG; } if (tp && (flags & ATTR_UTIME)) xfs_trans_log_inode (tp, ip, XFS_ILOG_CORE); } /* * Send out timestamp changes that need to be set to the * current time. Not done when called by a DMI function. */ if (timeflags && !(flags & ATTR_DMI)) xfs_ichgtime(ip, timeflags); And note that MTIME, will actually turn on the timeflags for XFS_ICHGTIME_CHG, which is the time associated with XFS_AT_CTIME. So for the 2 do_truncate calls paths, it will set the mtime based on the va_mtime and set the ctime based on current time (nanotime()). Our shortcut is setting both to current time. I don't know if anyone really cares. I don't like all these inconsistencies. One way to reduce inconsistencies is to allow code to go thru common paths so we can do the same strange thing in the one spot ;-) It looks like in the AT_SIZE, we should always set those timeflags irrespective of AT_CTIME. BTW, your locking looks wrong - it appears you don't unlock when the file is non-zero size. --Tim > --- xfs_vnodeops.c 2007-09-04 15:57:40.000000000 +0100 > +++ /tmp/xfs_vnodeops.c 2007-10-30 17:11:32.000000000 +0000 > @@ -378,6 +378,24 @@ > return (code); > } > > + > + if ((mask & XFS_AT_SIZE) && (vap->va_size == 0)) { > + > + /* Short circuit the truncate case for zero length files */ > + > + xfs_ilock(ip, XFS_ILOCK_EXCL); > + if ((ip->i_d.di_size == 0) && (ip->i_d.di_nextents == 0)) { > + xfs_iunlock(ip, XFS_ILOCK_EXCL); > + if (mask & XFS_AT_CTIME) > + xfs_ichgtime(ip, > XFS_ICHGTIME_MOD|XFS_ICHGTIME_CHG); > + mask &= ~(XFS_AT_SIZE|XFS_AT_CTIME); > + if (mask == 0) { > + code = 0; > + goto error_return; > + } > + } > + } > + > /* > * For the other attributes, we acquire the inode lock and > * first do an error checking pass. > @@ -528,17 +546,6 @@ > * Truncate file. Must have write permission and not be a > directory. > */ > if (mask & XFS_AT_SIZE) { > - /* Short circuit the truncate case for zero length files */ > - if ((vap->va_size == 0) && > - (ip->i_d.di_size == 0) && (ip->i_d.di_nextents == 0)) { > - xfs_iunlock(ip, XFS_ILOCK_EXCL); > - lock_flags &= ~XFS_ILOCK_EXCL; > - if (mask & XFS_AT_CTIME) > - xfs_ichgtime(ip, XFS_ICHGTIME_MOD | > XFS_ICHGTIME_CHG); > - code = 0; > - goto error_return; > - } > - > if (vp->v_type == VDIR) { > code = XFS_ERROR(EISDIR); > goto error_return; From owner-xfs@oss.sgi.com Thu Nov 1 18:41:02 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 01 Nov 2007 18:41:07 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA21exEM009518 for ; Thu, 1 Nov 2007 18:41:01 -0700 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA03691; Fri, 2 Nov 2007 12:40:58 +1100 Received: by chook.melbourne.sgi.com (Postfix, from userid 16346) id 6F86B58C38F7; Fri, 2 Nov 2007 12:40:58 +1100 (EST) To: sgi.bugs.xfs@engr.sgi.com Cc: xfs@oss.sgi.com Subject: PARTIAL TAKE 971186 - Fix up sparse warnings Message-Id: <20071102014058.6F86B58C38F7@chook.melbourne.sgi.com> Date: Fri, 2 Nov 2007 12:40:58 +1100 (EST) From: dgc@sgi.com (David Chinner) X-Virus-Scanned: ClamAV 0.91.2/4659/Thu Nov 1 09:24:40 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13521 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Fix up sparse warnings. These are mostly locking annotations, marking things static, casts where needed and declaring stuff in header files. Date: Fri Nov 2 12:40:25 AEDT 2007 Workarea: chook.melbourne.sgi.com:/build/dgc/isms/2.6.x-xfs Inspected by: hch@infradead.org,lachlan@sgi.com The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb Modid: xfs-linux-melb:xfs-kern:30002a fs/xfs/xfs_log.c - 1.343 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_log.c.diff?r1=text&tr1=1.343&r2=text&tr2=1.342&f=h - Fix up sparse warnings. fs/xfs/xfs_buf_item.h - 1.47 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_buf_item.h.diff?r1=text&tr1=1.47&r2=text&tr2=1.46&f=h - Fix up sparse warnings. fs/xfs/xfs_da_btree.h - 1.67 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_da_btree.h.diff?r1=text&tr1=1.67&r2=text&tr2=1.66&f=h - Fix up sparse warnings. fs/xfs/xfs_log_recover.c - 1.331 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_log_recover.c.diff?r1=text&tr1=1.331&r2=text&tr2=1.330&f=h - Fix up sparse warnings. fs/xfs/xfs_trans_item.c - 1.46 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_trans_item.c.diff?r1=text&tr1=1.46&r2=text&tr2=1.45&f=h - Fix up sparse warnings. fs/xfs/xfs_vfsops.c - 1.546 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_vfsops.c.diff?r1=text&tr1=1.546&r2=text&tr2=1.545&f=h - Fix up sparse warnings. fs/xfs/xfs_mount.c - 1.416 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_mount.c.diff?r1=text&tr1=1.416&r2=text&tr2=1.415&f=h - Fix up sparse warnings. fs/xfs/xfs_btree.h - 1.67 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_btree.h.diff?r1=text&tr1=1.67&r2=text&tr2=1.66&f=h - Fix up sparse warnings. fs/xfs/xfs_trans.h - 1.146 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_trans.h.diff?r1=text&tr1=1.146&r2=text&tr2=1.145&f=h - Fix up sparse warnings. fs/xfs/xfs_bmap.h - 1.102 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_bmap.h.diff?r1=text&tr1=1.102&r2=text&tr2=1.101&f=h - Fix up sparse warnings. fs/xfs/xfs_bmap.c - 1.380 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_bmap.c.diff?r1=text&tr1=1.380&r2=text&tr2=1.379&f=h - Fix up sparse warnings. fs/xfs/xfs_rename.c - 1.77 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_rename.c.diff?r1=text&tr1=1.77&r2=text&tr2=1.76&f=h - Fix up sparse warnings. fs/xfs/xfs_attr.c - 1.146 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_attr.c.diff?r1=text&tr1=1.146&r2=text&tr2=1.145&f=h - Fix up sparse warnings. fs/xfs/xfs_dir2.c - 1.61 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_dir2.c.diff?r1=text&tr1=1.61&r2=text&tr2=1.60&f=h - Fix up sparse warnings. fs/xfs/linux-2.6/xfs_ioctl.c - 1.157 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_ioctl.c.diff?r1=text&tr1=1.157&r2=text&tr2=1.156&f=h - Fix up sparse warnings. fs/xfs/linux-2.6/xfs_globals.c - 1.74 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_globals.c.diff?r1=text&tr1=1.74&r2=text&tr2=1.73&f=h - Fix up sparse warnings. fs/xfs/linux-2.6/xfs_ioctl32.c - 1.23 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_ioctl32.c.diff?r1=text&tr1=1.23&r2=text&tr2=1.22&f=h - Fix up sparse warnings. fs/xfs/xfs_mru_cache.c - 1.5 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_mru_cache.c.diff?r1=text&tr1=1.5&r2=text&tr2=1.4&f=h - Fix up sparse warnings. fs/xfs/xfs_filestream.c - 1.4 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_filestream.c.diff?r1=text&tr1=1.4&r2=text&tr2=1.3&f=h - Fix up sparse warnings. From owner-xfs@oss.sgi.com Thu Nov 1 18:45:15 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 01 Nov 2007 18:45:20 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA21jCXK010341 for ; Thu, 1 Nov 2007 18:45:14 -0700 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA03857; Fri, 2 Nov 2007 12:45:13 +1100 Received: by chook.melbourne.sgi.com (Postfix, from userid 16346) id 0567058C38F7; Fri, 2 Nov 2007 12:45:12 +1100 (EST) To: sgi.bugs.xfs@engr.sgi.com Cc: xfs@oss.sgi.com Subject: TAKE 972755 - Fix sparse warning in xlog_recover_do_efd_trans. Message-Id: <20071102014513.0567058C38F7@chook.melbourne.sgi.com> Date: Fri, 2 Nov 2007 12:45:12 +1100 (EST) From: dgc@sgi.com (David Chinner) X-Virus-Scanned: ClamAV 0.91.2/4659/Thu Nov 1 09:24:40 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13522 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Fix sparse warning in xlog_recover_do_efd_trans. Sparse trips over the locking order in xlog_recover_do_efd_trans() when xfs_trans_delete_ail() drops the ail lock. Because the unlock is conditional, we need to either annotate with a "fake unlock" or change the structure of the code so sparse thinks the function always unlocks. Reordering the code makes it simpler, so do that. Date: Fri Nov 2 12:44:49 AEDT 2007 Workarea: chook.melbourne.sgi.com:/build/dgc/isms/2.6.x-xfs Inspected by: hch@infradead.org, lachlan@sgi.com The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb Modid: xfs-linux-melb:xfs-kern:30003a fs/xfs/xfs_log_recover.c - 1.332 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_log_recover.c.diff?r1=text&tr1=1.332&r2=text&tr2=1.331&f=h - Fix sparse warning in xlog_recover_do_efd_trans. From owner-xfs@oss.sgi.com Thu Nov 1 18:49:11 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 01 Nov 2007 18:49:19 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA21n3ug011063 for ; Thu, 1 Nov 2007 18:49:07 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA03947; Fri, 2 Nov 2007 12:48:57 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lA21mudD89856169; Fri, 2 Nov 2007 12:48:57 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lA21mt2V90549984; Fri, 2 Nov 2007 12:48:55 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Fri, 2 Nov 2007 12:48:55 +1100 From: David Chinner To: Christoph Hellwig Cc: David Chinner , xfs@oss.sgi.com, xfs-dev@sgi.com Subject: Re: [PATCH] show all mount args in /proc/mounts Message-ID: <20071102014855.GH995458@sgi.com> References: <20071029233543.GQ995458@sgi.com> <20071030100617.GB23489@infradead.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071030100617.GB23489@infradead.org> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4659/Thu Nov 1 09:24:40 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13523 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Tue, Oct 30, 2007 at 10:06:17AM +0000, Christoph Hellwig wrote: > On Tue, Oct 30, 2007 at 10:35:43AM +1100, David Chinner wrote: > > There are several mount options that don't show up in /proc/mounts. > > Add them in and clean up the showargs code at the same time. > > Looks good. Care to submit a patch ontop of this to move all the mount > option handling to xfs_super.c as it's entirely linux-specific in this > form? Sure. This what you mean? ----- Mount option parsing is platform specific. Move it out of core code into the platform specific superblock operation file. Signed-off-by: Dave Chinner --- fs/xfs/linux-2.6/xfs_super.c | 430 +++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_vfsops.c | 430 ------------------------------------------- fs/xfs/xfs_vfsops.h | 3 3 files changed, 430 insertions(+), 433 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_super.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_super.c 2007-10-24 16:01:47.000000000 +1000 +++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_super.c 2007-10-31 10:24:31.412771393 +1100 @@ -50,6 +50,7 @@ #include "xfs_vnodeops.h" #include "xfs_vfsops.h" #include "xfs_version.h" +#include "xfs_log_priv.h" #include #include @@ -88,6 +89,435 @@ xfs_args_allocate( return args; } +#define MNTOPT_LOGBUFS "logbufs" /* number of XFS log buffers */ +#define MNTOPT_LOGBSIZE "logbsize" /* size of XFS log buffers */ +#define MNTOPT_LOGDEV "logdev" /* log device */ +#define MNTOPT_RTDEV "rtdev" /* realtime I/O device */ +#define MNTOPT_BIOSIZE "biosize" /* log2 of preferred buffered io size */ +#define MNTOPT_WSYNC "wsync" /* safe-mode nfs compatible mount */ +#define MNTOPT_INO64 "ino64" /* force inodes into 64-bit range */ +#define MNTOPT_NOALIGN "noalign" /* turn off stripe alignment */ +#define MNTOPT_SWALLOC "swalloc" /* turn on stripe width allocation */ +#define MNTOPT_SUNIT "sunit" /* data volume stripe unit */ +#define MNTOPT_SWIDTH "swidth" /* data volume stripe width */ +#define MNTOPT_NOUUID "nouuid" /* ignore filesystem UUID */ +#define MNTOPT_MTPT "mtpt" /* filesystem mount point */ +#define MNTOPT_GRPID "grpid" /* group-ID from parent directory */ +#define MNTOPT_NOGRPID "nogrpid" /* group-ID from current process */ +#define MNTOPT_BSDGROUPS "bsdgroups" /* group-ID from parent directory */ +#define MNTOPT_SYSVGROUPS "sysvgroups" /* group-ID from current process */ +#define MNTOPT_ALLOCSIZE "allocsize" /* preferred allocation size */ +#define MNTOPT_NORECOVERY "norecovery" /* don't run XFS recovery */ +#define MNTOPT_BARRIER "barrier" /* use writer barriers for log write and + * unwritten extent conversion */ +#define MNTOPT_NOBARRIER "nobarrier" /* .. disable */ +#define MNTOPT_OSYNCISOSYNC "osyncisosync" /* o_sync is REALLY o_sync */ +#define MNTOPT_64BITINODE "inode64" /* inodes can be allocated anywhere */ +#define MNTOPT_IKEEP "ikeep" /* do not free empty inode clusters */ +#define MNTOPT_NOIKEEP "noikeep" /* free empty inode clusters */ +#define MNTOPT_LARGEIO "largeio" /* report large I/O sizes in stat() */ +#define MNTOPT_NOLARGEIO "nolargeio" /* do not report large I/O sizes + * in stat(). */ +#define MNTOPT_ATTR2 "attr2" /* do use attr2 attribute format */ +#define MNTOPT_NOATTR2 "noattr2" /* do not use attr2 attribute format */ +#define MNTOPT_FILESTREAM "filestreams" /* use filestreams allocator */ +#define MNTOPT_QUOTA "quota" /* disk quotas (user) */ +#define MNTOPT_NOQUOTA "noquota" /* no quotas */ +#define MNTOPT_USRQUOTA "usrquota" /* user quota enabled */ +#define MNTOPT_GRPQUOTA "grpquota" /* group quota enabled */ +#define MNTOPT_PRJQUOTA "prjquota" /* project quota enabled */ +#define MNTOPT_UQUOTA "uquota" /* user quota (IRIX variant) */ +#define MNTOPT_GQUOTA "gquota" /* group quota (IRIX variant) */ +#define MNTOPT_PQUOTA "pquota" /* project quota (IRIX variant) */ +#define MNTOPT_UQUOTANOENF "uqnoenforce"/* user quota limit enforcement */ +#define MNTOPT_GQUOTANOENF "gqnoenforce"/* group quota limit enforcement */ +#define MNTOPT_PQUOTANOENF "pqnoenforce"/* project quota limit enforcement */ +#define MNTOPT_QUOTANOENF "qnoenforce" /* same as uqnoenforce */ +#define MNTOPT_DMAPI "dmapi" /* DMI enabled (DMAPI / XDSM) */ +#define MNTOPT_XDSM "xdsm" /* DMI enabled (DMAPI / XDSM) */ +#define MNTOPT_DMI "dmi" /* DMI enabled (DMAPI / XDSM) */ + +STATIC unsigned long +suffix_strtoul(char *s, char **endp, unsigned int base) +{ + int last, shift_left_factor = 0; + char *value = s; + + last = strlen(value) - 1; + if (value[last] == 'K' || value[last] == 'k') { + shift_left_factor = 10; + value[last] = '\0'; + } + if (value[last] == 'M' || value[last] == 'm') { + shift_left_factor = 20; + value[last] = '\0'; + } + if (value[last] == 'G' || value[last] == 'g') { + shift_left_factor = 30; + value[last] = '\0'; + } + + return simple_strtoul((const char *)s, endp, base) << shift_left_factor; +} + +STATIC int +xfs_parseargs( + struct xfs_mount *mp, + char *options, + struct xfs_mount_args *args, + int update) +{ + char *this_char, *value, *eov; + int dsunit, dswidth, vol_dsunit, vol_dswidth; + int iosize; + int ikeep = 0; + + args->flags |= XFSMNT_BARRIER; + args->flags2 |= XFSMNT2_COMPAT_IOSIZE; + + if (!options) + goto done; + + iosize = dsunit = dswidth = vol_dsunit = vol_dswidth = 0; + + while ((this_char = strsep(&options, ",")) != NULL) { + if (!*this_char) + continue; + if ((value = strchr(this_char, '=')) != NULL) + *value++ = 0; + + if (!strcmp(this_char, MNTOPT_LOGBUFS)) { + if (!value || !*value) { + cmn_err(CE_WARN, + "XFS: %s option requires an argument", + this_char); + return EINVAL; + } + args->logbufs = simple_strtoul(value, &eov, 10); + } else if (!strcmp(this_char, MNTOPT_LOGBSIZE)) { + if (!value || !*value) { + cmn_err(CE_WARN, + "XFS: %s option requires an argument", + this_char); + return EINVAL; + } + args->logbufsize = suffix_strtoul(value, &eov, 10); + } else if (!strcmp(this_char, MNTOPT_LOGDEV)) { + if (!value || !*value) { + cmn_err(CE_WARN, + "XFS: %s option requires an argument", + this_char); + return EINVAL; + } + strncpy(args->logname, value, MAXNAMELEN); + } else if (!strcmp(this_char, MNTOPT_MTPT)) { + if (!value || !*value) { + cmn_err(CE_WARN, + "XFS: %s option requires an argument", + this_char); + return EINVAL; + } + strncpy(args->mtpt, value, MAXNAMELEN); + } else if (!strcmp(this_char, MNTOPT_RTDEV)) { + if (!value || !*value) { + cmn_err(CE_WARN, + "XFS: %s option requires an argument", + this_char); + return EINVAL; + } + strncpy(args->rtname, value, MAXNAMELEN); + } else if (!strcmp(this_char, MNTOPT_BIOSIZE)) { + if (!value || !*value) { + cmn_err(CE_WARN, + "XFS: %s option requires an argument", + this_char); + return EINVAL; + } + iosize = simple_strtoul(value, &eov, 10); + args->flags |= XFSMNT_IOSIZE; + args->iosizelog = (uint8_t) iosize; + } else if (!strcmp(this_char, MNTOPT_ALLOCSIZE)) { + if (!value || !*value) { + cmn_err(CE_WARN, + "XFS: %s option requires an argument", + this_char); + return EINVAL; + } + iosize = suffix_strtoul(value, &eov, 10); + args->flags |= XFSMNT_IOSIZE; + args->iosizelog = ffs(iosize) - 1; + } else if (!strcmp(this_char, MNTOPT_GRPID) || + !strcmp(this_char, MNTOPT_BSDGROUPS)) { + mp->m_flags |= XFS_MOUNT_GRPID; + } else if (!strcmp(this_char, MNTOPT_NOGRPID) || + !strcmp(this_char, MNTOPT_SYSVGROUPS)) { + mp->m_flags &= ~XFS_MOUNT_GRPID; + } else if (!strcmp(this_char, MNTOPT_WSYNC)) { + args->flags |= XFSMNT_WSYNC; + } else if (!strcmp(this_char, MNTOPT_OSYNCISOSYNC)) { + args->flags |= XFSMNT_OSYNCISOSYNC; + } else if (!strcmp(this_char, MNTOPT_NORECOVERY)) { + args->flags |= XFSMNT_NORECOVERY; + } else if (!strcmp(this_char, MNTOPT_INO64)) { + args->flags |= XFSMNT_INO64; +#if !XFS_BIG_INUMS + cmn_err(CE_WARN, + "XFS: %s option not allowed on this system", + this_char); + return EINVAL; +#endif + } else if (!strcmp(this_char, MNTOPT_NOALIGN)) { + args->flags |= XFSMNT_NOALIGN; + } else if (!strcmp(this_char, MNTOPT_SWALLOC)) { + args->flags |= XFSMNT_SWALLOC; + } else if (!strcmp(this_char, MNTOPT_SUNIT)) { + if (!value || !*value) { + cmn_err(CE_WARN, + "XFS: %s option requires an argument", + this_char); + return EINVAL; + } + dsunit = simple_strtoul(value, &eov, 10); + } else if (!strcmp(this_char, MNTOPT_SWIDTH)) { + if (!value || !*value) { + cmn_err(CE_WARN, + "XFS: %s option requires an argument", + this_char); + return EINVAL; + } + dswidth = simple_strtoul(value, &eov, 10); + } else if (!strcmp(this_char, MNTOPT_64BITINODE)) { + args->flags &= ~XFSMNT_32BITINODES; +#if !XFS_BIG_INUMS + cmn_err(CE_WARN, + "XFS: %s option not allowed on this system", + this_char); + return EINVAL; +#endif + } else if (!strcmp(this_char, MNTOPT_NOUUID)) { + args->flags |= XFSMNT_NOUUID; + } else if (!strcmp(this_char, MNTOPT_BARRIER)) { + args->flags |= XFSMNT_BARRIER; + } else if (!strcmp(this_char, MNTOPT_NOBARRIER)) { + args->flags &= ~XFSMNT_BARRIER; + } else if (!strcmp(this_char, MNTOPT_IKEEP)) { + ikeep = 1; + args->flags &= ~XFSMNT_IDELETE; + } else if (!strcmp(this_char, MNTOPT_NOIKEEP)) { + args->flags |= XFSMNT_IDELETE; + } else if (!strcmp(this_char, MNTOPT_LARGEIO)) { + args->flags2 &= ~XFSMNT2_COMPAT_IOSIZE; + } else if (!strcmp(this_char, MNTOPT_NOLARGEIO)) { + args->flags2 |= XFSMNT2_COMPAT_IOSIZE; + } else if (!strcmp(this_char, MNTOPT_ATTR2)) { + args->flags |= XFSMNT_ATTR2; + } else if (!strcmp(this_char, MNTOPT_NOATTR2)) { + args->flags &= ~XFSMNT_ATTR2; + } else if (!strcmp(this_char, MNTOPT_FILESTREAM)) { + args->flags2 |= XFSMNT2_FILESTREAMS; + } else if (!strcmp(this_char, MNTOPT_NOQUOTA)) { + args->flags &= ~(XFSMNT_UQUOTAENF|XFSMNT_UQUOTA); + args->flags &= ~(XFSMNT_GQUOTAENF|XFSMNT_GQUOTA); + } else if (!strcmp(this_char, MNTOPT_QUOTA) || + !strcmp(this_char, MNTOPT_UQUOTA) || + !strcmp(this_char, MNTOPT_USRQUOTA)) { + args->flags |= XFSMNT_UQUOTA | XFSMNT_UQUOTAENF; + } else if (!strcmp(this_char, MNTOPT_QUOTANOENF) || + !strcmp(this_char, MNTOPT_UQUOTANOENF)) { + args->flags |= XFSMNT_UQUOTA; + args->flags &= ~XFSMNT_UQUOTAENF; + } else if (!strcmp(this_char, MNTOPT_PQUOTA) || + !strcmp(this_char, MNTOPT_PRJQUOTA)) { + args->flags |= XFSMNT_PQUOTA | XFSMNT_PQUOTAENF; + } else if (!strcmp(this_char, MNTOPT_PQUOTANOENF)) { + args->flags |= XFSMNT_PQUOTA; + args->flags &= ~XFSMNT_PQUOTAENF; + } else if (!strcmp(this_char, MNTOPT_GQUOTA) || + !strcmp(this_char, MNTOPT_GRPQUOTA)) { + args->flags |= XFSMNT_GQUOTA | XFSMNT_GQUOTAENF; + } else if (!strcmp(this_char, MNTOPT_GQUOTANOENF)) { + args->flags |= XFSMNT_GQUOTA; + args->flags &= ~XFSMNT_GQUOTAENF; + } else if (!strcmp(this_char, MNTOPT_DMAPI)) { + args->flags |= XFSMNT_DMAPI; + } else if (!strcmp(this_char, MNTOPT_XDSM)) { + args->flags |= XFSMNT_DMAPI; + } else if (!strcmp(this_char, MNTOPT_DMI)) { + args->flags |= XFSMNT_DMAPI; + } else if (!strcmp(this_char, "ihashsize")) { + cmn_err(CE_WARN, + "XFS: ihashsize no longer used, option is deprecated."); + } else if (!strcmp(this_char, "osyncisdsync")) { + /* no-op, this is now the default */ + cmn_err(CE_WARN, + "XFS: osyncisdsync is now the default, option is deprecated."); + } else if (!strcmp(this_char, "irixsgid")) { + cmn_err(CE_WARN, + "XFS: irixsgid is now a sysctl(2) variable, option is deprecated."); + } else { + cmn_err(CE_WARN, + "XFS: unknown mount option [%s].", this_char); + return EINVAL; + } + } + + if (args->flags & XFSMNT_NORECOVERY) { + if ((mp->m_flags & XFS_MOUNT_RDONLY) == 0) { + cmn_err(CE_WARN, + "XFS: no-recovery mounts must be read-only."); + return EINVAL; + } + } + + if ((args->flags & XFSMNT_NOALIGN) && (dsunit || dswidth)) { + cmn_err(CE_WARN, + "XFS: sunit and swidth options incompatible with the noalign option"); + return EINVAL; + } + + if ((args->flags & XFSMNT_GQUOTA) && (args->flags & XFSMNT_PQUOTA)) { + cmn_err(CE_WARN, + "XFS: cannot mount with both project and group quota"); + return EINVAL; + } + + if ((args->flags & XFSMNT_DMAPI) && *args->mtpt == '\0') { + printk("XFS: %s option needs the mount point option as well\n", + MNTOPT_DMAPI); + return EINVAL; + } + + if ((dsunit && !dswidth) || (!dsunit && dswidth)) { + cmn_err(CE_WARN, + "XFS: sunit and swidth must be specified together"); + return EINVAL; + } + + if (dsunit && (dswidth % dsunit != 0)) { + cmn_err(CE_WARN, + "XFS: stripe width (%d) must be a multiple of the stripe unit (%d)", + dswidth, dsunit); + return EINVAL; + } + + /* + * Applications using DMI filesystems often expect the + * inode generation number to be monotonically increasing. + * If we delete inode chunks we break this assumption, so + * keep unused inode chunks on disk for DMI filesystems + * until we come up with a better solution. + * Note that if "ikeep" or "noikeep" mount options are + * supplied, then they are honored. + */ + if (!(args->flags & XFSMNT_DMAPI) && !ikeep) + args->flags |= XFSMNT_IDELETE; + + if ((args->flags & XFSMNT_NOALIGN) != XFSMNT_NOALIGN) { + if (dsunit) { + args->sunit = dsunit; + args->flags |= XFSMNT_RETERR; + } else { + args->sunit = vol_dsunit; + } + dswidth ? (args->swidth = dswidth) : + (args->swidth = vol_dswidth); + } else { + args->sunit = args->swidth = 0; + } + +done: + if (args->flags & XFSMNT_32BITINODES) + mp->m_flags |= XFS_MOUNT_SMALL_INUMS; + if (args->flags2) + args->flags |= XFSMNT_FLAGS2; + return 0; +} + +struct proc_xfs_info { + int flag; + char *str; +}; + +STATIC int +xfs_showargs( + struct xfs_mount *mp, + struct seq_file *m) +{ + static struct proc_xfs_info xfs_info_set[] = { + /* the few simple ones we can get from the mount struct */ + { XFS_MOUNT_WSYNC, "," MNTOPT_WSYNC }, + { XFS_MOUNT_INO64, "," MNTOPT_INO64 }, + { XFS_MOUNT_NOALIGN, "," MNTOPT_NOALIGN }, + { XFS_MOUNT_SWALLOC, "," MNTOPT_SWALLOC }, + { XFS_MOUNT_NOUUID, "," MNTOPT_NOUUID }, + { XFS_MOUNT_NORECOVERY, "," MNTOPT_NORECOVERY }, + { XFS_MOUNT_OSYNCISOSYNC, "," MNTOPT_OSYNCISOSYNC }, + { XFS_MOUNT_ATTR2, "," MNTOPT_ATTR2 }, + { XFS_MOUNT_FILESTREAMS, "," MNTOPT_FILESTREAM }, + { XFS_MOUNT_DMAPI, "," MNTOPT_DMAPI }, + { XFS_MOUNT_GRPID, "," MNTOPT_GRPID }, + { 0, NULL } + }; + static struct proc_xfs_info xfs_info_unset[] = { + /* the few simple ones we can get from the mount struct */ + { XFS_MOUNT_IDELETE, "," MNTOPT_IKEEP }, + { XFS_MOUNT_COMPAT_IOSIZE, "," MNTOPT_LARGEIO }, + { XFS_MOUNT_BARRIER, "," MNTOPT_NOBARRIER }, + { XFS_MOUNT_SMALL_INUMS, "," MNTOPT_64BITINODE }, + { 0, NULL } + }; + struct proc_xfs_info *xfs_infop; + + for (xfs_infop = xfs_info_set; xfs_infop->flag; xfs_infop++) { + if (mp->m_flags & xfs_infop->flag) + seq_puts(m, xfs_infop->str); + } + for (xfs_infop = xfs_info_unset; xfs_infop->flag; xfs_infop++) { + if (!(mp->m_flags & xfs_infop->flag)) + seq_puts(m, xfs_infop->str); + } + + if (mp->m_flags & XFS_MOUNT_DFLT_IOSIZE) + seq_printf(m, "," MNTOPT_ALLOCSIZE "=%dk", + (int)(1 << mp->m_writeio_log) >> 10); + + if (mp->m_logbufs > 0) + seq_printf(m, "," MNTOPT_LOGBUFS "=%d", mp->m_logbufs); + if (mp->m_logbsize > 0) + seq_printf(m, "," MNTOPT_LOGBSIZE "=%dk", mp->m_logbsize >> 10); + + if (mp->m_logname) + seq_printf(m, "," MNTOPT_LOGDEV "=%s", mp->m_logname); + if (mp->m_rtname) + seq_printf(m, "," MNTOPT_RTDEV "=%s", mp->m_rtname); + + if (mp->m_dalign > 0) + seq_printf(m, "," MNTOPT_SUNIT "=%d", + (int)XFS_FSB_TO_BB(mp, mp->m_dalign)); + if (mp->m_swidth > 0) + seq_printf(m, "," MNTOPT_SWIDTH "=%d", + (int)XFS_FSB_TO_BB(mp, mp->m_swidth)); + + if (mp->m_qflags & (XFS_UQUOTA_ACCT|XFS_UQUOTA_ENFD)) + seq_puts(m, "," MNTOPT_USRQUOTA); + else if (mp->m_qflags & XFS_UQUOTA_ACCT) + seq_puts(m, "," MNTOPT_UQUOTANOENF); + + if (mp->m_qflags & (XFS_PQUOTA_ACCT|XFS_OQUOTA_ENFD)) + seq_puts(m, "," MNTOPT_PRJQUOTA); + else if (mp->m_qflags & XFS_PQUOTA_ACCT) + seq_puts(m, "," MNTOPT_PQUOTANOENF); + + if (mp->m_qflags & (XFS_GQUOTA_ACCT|XFS_OQUOTA_ENFD)) + seq_puts(m, "," MNTOPT_GRPQUOTA); + else if (mp->m_qflags & XFS_GQUOTA_ACCT) + seq_puts(m, "," MNTOPT_GQUOTANOENF); + + if (!(mp->m_qflags & XFS_ALL_QUOTA_ACCT)) + seq_puts(m, "," MNTOPT_NOQUOTA); + + return 0; +} __uint64_t xfs_max_file_offset( unsigned int blockshift) Index: 2.6.x-xfs-new/fs/xfs/xfs_vfsops.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_vfsops.c 2007-10-31 10:06:18.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_vfsops.c 2007-10-31 10:24:25.909483756 +1100 @@ -1482,433 +1482,3 @@ xfs_vget( return 0; } - -#define MNTOPT_LOGBUFS "logbufs" /* number of XFS log buffers */ -#define MNTOPT_LOGBSIZE "logbsize" /* size of XFS log buffers */ -#define MNTOPT_LOGDEV "logdev" /* log device */ -#define MNTOPT_RTDEV "rtdev" /* realtime I/O device */ -#define MNTOPT_BIOSIZE "biosize" /* log2 of preferred buffered io size */ -#define MNTOPT_WSYNC "wsync" /* safe-mode nfs compatible mount */ -#define MNTOPT_INO64 "ino64" /* force inodes into 64-bit range */ -#define MNTOPT_NOALIGN "noalign" /* turn off stripe alignment */ -#define MNTOPT_SWALLOC "swalloc" /* turn on stripe width allocation */ -#define MNTOPT_SUNIT "sunit" /* data volume stripe unit */ -#define MNTOPT_SWIDTH "swidth" /* data volume stripe width */ -#define MNTOPT_NOUUID "nouuid" /* ignore filesystem UUID */ -#define MNTOPT_MTPT "mtpt" /* filesystem mount point */ -#define MNTOPT_GRPID "grpid" /* group-ID from parent directory */ -#define MNTOPT_NOGRPID "nogrpid" /* group-ID from current process */ -#define MNTOPT_BSDGROUPS "bsdgroups" /* group-ID from parent directory */ -#define MNTOPT_SYSVGROUPS "sysvgroups" /* group-ID from current process */ -#define MNTOPT_ALLOCSIZE "allocsize" /* preferred allocation size */ -#define MNTOPT_NORECOVERY "norecovery" /* don't run XFS recovery */ -#define MNTOPT_BARRIER "barrier" /* use writer barriers for log write and - * unwritten extent conversion */ -#define MNTOPT_NOBARRIER "nobarrier" /* .. disable */ -#define MNTOPT_OSYNCISOSYNC "osyncisosync" /* o_sync is REALLY o_sync */ -#define MNTOPT_64BITINODE "inode64" /* inodes can be allocated anywhere */ -#define MNTOPT_IKEEP "ikeep" /* do not free empty inode clusters */ -#define MNTOPT_NOIKEEP "noikeep" /* free empty inode clusters */ -#define MNTOPT_LARGEIO "largeio" /* report large I/O sizes in stat() */ -#define MNTOPT_NOLARGEIO "nolargeio" /* do not report large I/O sizes - * in stat(). */ -#define MNTOPT_ATTR2 "attr2" /* do use attr2 attribute format */ -#define MNTOPT_NOATTR2 "noattr2" /* do not use attr2 attribute format */ -#define MNTOPT_FILESTREAM "filestreams" /* use filestreams allocator */ -#define MNTOPT_QUOTA "quota" /* disk quotas (user) */ -#define MNTOPT_NOQUOTA "noquota" /* no quotas */ -#define MNTOPT_USRQUOTA "usrquota" /* user quota enabled */ -#define MNTOPT_GRPQUOTA "grpquota" /* group quota enabled */ -#define MNTOPT_PRJQUOTA "prjquota" /* project quota enabled */ -#define MNTOPT_UQUOTA "uquota" /* user quota (IRIX variant) */ -#define MNTOPT_GQUOTA "gquota" /* group quota (IRIX variant) */ -#define MNTOPT_PQUOTA "pquota" /* project quota (IRIX variant) */ -#define MNTOPT_UQUOTANOENF "uqnoenforce"/* user quota limit enforcement */ -#define MNTOPT_GQUOTANOENF "gqnoenforce"/* group quota limit enforcement */ -#define MNTOPT_PQUOTANOENF "pqnoenforce"/* project quota limit enforcement */ -#define MNTOPT_QUOTANOENF "qnoenforce" /* same as uqnoenforce */ -#define MNTOPT_DMAPI "dmapi" /* DMI enabled (DMAPI / XDSM) */ -#define MNTOPT_XDSM "xdsm" /* DMI enabled (DMAPI / XDSM) */ -#define MNTOPT_DMI "dmi" /* DMI enabled (DMAPI / XDSM) */ - -STATIC unsigned long -suffix_strtoul(char *s, char **endp, unsigned int base) -{ - int last, shift_left_factor = 0; - char *value = s; - - last = strlen(value) - 1; - if (value[last] == 'K' || value[last] == 'k') { - shift_left_factor = 10; - value[last] = '\0'; - } - if (value[last] == 'M' || value[last] == 'm') { - shift_left_factor = 20; - value[last] = '\0'; - } - if (value[last] == 'G' || value[last] == 'g') { - shift_left_factor = 30; - value[last] = '\0'; - } - - return simple_strtoul((const char *)s, endp, base) << shift_left_factor; -} - -int -xfs_parseargs( - struct xfs_mount *mp, - char *options, - struct xfs_mount_args *args, - int update) -{ - char *this_char, *value, *eov; - int dsunit, dswidth, vol_dsunit, vol_dswidth; - int iosize; - int ikeep = 0; - - args->flags |= XFSMNT_BARRIER; - args->flags2 |= XFSMNT2_COMPAT_IOSIZE; - - if (!options) - goto done; - - iosize = dsunit = dswidth = vol_dsunit = vol_dswidth = 0; - - while ((this_char = strsep(&options, ",")) != NULL) { - if (!*this_char) - continue; - if ((value = strchr(this_char, '=')) != NULL) - *value++ = 0; - - if (!strcmp(this_char, MNTOPT_LOGBUFS)) { - if (!value || !*value) { - cmn_err(CE_WARN, - "XFS: %s option requires an argument", - this_char); - return EINVAL; - } - args->logbufs = simple_strtoul(value, &eov, 10); - } else if (!strcmp(this_char, MNTOPT_LOGBSIZE)) { - if (!value || !*value) { - cmn_err(CE_WARN, - "XFS: %s option requires an argument", - this_char); - return EINVAL; - } - args->logbufsize = suffix_strtoul(value, &eov, 10); - } else if (!strcmp(this_char, MNTOPT_LOGDEV)) { - if (!value || !*value) { - cmn_err(CE_WARN, - "XFS: %s option requires an argument", - this_char); - return EINVAL; - } - strncpy(args->logname, value, MAXNAMELEN); - } else if (!strcmp(this_char, MNTOPT_MTPT)) { - if (!value || !*value) { - cmn_err(CE_WARN, - "XFS: %s option requires an argument", - this_char); - return EINVAL; - } - strncpy(args->mtpt, value, MAXNAMELEN); - } else if (!strcmp(this_char, MNTOPT_RTDEV)) { - if (!value || !*value) { - cmn_err(CE_WARN, - "XFS: %s option requires an argument", - this_char); - return EINVAL; - } - strncpy(args->rtname, value, MAXNAMELEN); - } else if (!strcmp(this_char, MNTOPT_BIOSIZE)) { - if (!value || !*value) { - cmn_err(CE_WARN, - "XFS: %s option requires an argument", - this_char); - return EINVAL; - } - iosize = simple_strtoul(value, &eov, 10); - args->flags |= XFSMNT_IOSIZE; - args->iosizelog = (uint8_t) iosize; - } else if (!strcmp(this_char, MNTOPT_ALLOCSIZE)) { - if (!value || !*value) { - cmn_err(CE_WARN, - "XFS: %s option requires an argument", - this_char); - return EINVAL; - } - iosize = suffix_strtoul(value, &eov, 10); - args->flags |= XFSMNT_IOSIZE; - args->iosizelog = ffs(iosize) - 1; - } else if (!strcmp(this_char, MNTOPT_GRPID) || - !strcmp(this_char, MNTOPT_BSDGROUPS)) { - mp->m_flags |= XFS_MOUNT_GRPID; - } else if (!strcmp(this_char, MNTOPT_NOGRPID) || - !strcmp(this_char, MNTOPT_SYSVGROUPS)) { - mp->m_flags &= ~XFS_MOUNT_GRPID; - } else if (!strcmp(this_char, MNTOPT_WSYNC)) { - args->flags |= XFSMNT_WSYNC; - } else if (!strcmp(this_char, MNTOPT_OSYNCISOSYNC)) { - args->flags |= XFSMNT_OSYNCISOSYNC; - } else if (!strcmp(this_char, MNTOPT_NORECOVERY)) { - args->flags |= XFSMNT_NORECOVERY; - } else if (!strcmp(this_char, MNTOPT_INO64)) { - args->flags |= XFSMNT_INO64; -#if !XFS_BIG_INUMS - cmn_err(CE_WARN, - "XFS: %s option not allowed on this system", - this_char); - return EINVAL; -#endif - } else if (!strcmp(this_char, MNTOPT_NOALIGN)) { - args->flags |= XFSMNT_NOALIGN; - } else if (!strcmp(this_char, MNTOPT_SWALLOC)) { - args->flags |= XFSMNT_SWALLOC; - } else if (!strcmp(this_char, MNTOPT_SUNIT)) { - if (!value || !*value) { - cmn_err(CE_WARN, - "XFS: %s option requires an argument", - this_char); - return EINVAL; - } - dsunit = simple_strtoul(value, &eov, 10); - } else if (!strcmp(this_char, MNTOPT_SWIDTH)) { - if (!value || !*value) { - cmn_err(CE_WARN, - "XFS: %s option requires an argument", - this_char); - return EINVAL; - } - dswidth = simple_strtoul(value, &eov, 10); - } else if (!strcmp(this_char, MNTOPT_64BITINODE)) { - args->flags &= ~XFSMNT_32BITINODES; -#if !XFS_BIG_INUMS - cmn_err(CE_WARN, - "XFS: %s option not allowed on this system", - this_char); - return EINVAL; -#endif - } else if (!strcmp(this_char, MNTOPT_NOUUID)) { - args->flags |= XFSMNT_NOUUID; - } else if (!strcmp(this_char, MNTOPT_BARRIER)) { - args->flags |= XFSMNT_BARRIER; - } else if (!strcmp(this_char, MNTOPT_NOBARRIER)) { - args->flags &= ~XFSMNT_BARRIER; - } else if (!strcmp(this_char, MNTOPT_IKEEP)) { - ikeep = 1; - args->flags &= ~XFSMNT_IDELETE; - } else if (!strcmp(this_char, MNTOPT_NOIKEEP)) { - args->flags |= XFSMNT_IDELETE; - } else if (!strcmp(this_char, MNTOPT_LARGEIO)) { - args->flags2 &= ~XFSMNT2_COMPAT_IOSIZE; - } else if (!strcmp(this_char, MNTOPT_NOLARGEIO)) { - args->flags2 |= XFSMNT2_COMPAT_IOSIZE; - } else if (!strcmp(this_char, MNTOPT_ATTR2)) { - args->flags |= XFSMNT_ATTR2; - } else if (!strcmp(this_char, MNTOPT_NOATTR2)) { - args->flags &= ~XFSMNT_ATTR2; - } else if (!strcmp(this_char, MNTOPT_FILESTREAM)) { - args->flags2 |= XFSMNT2_FILESTREAMS; - } else if (!strcmp(this_char, MNTOPT_NOQUOTA)) { - args->flags &= ~(XFSMNT_UQUOTAENF|XFSMNT_UQUOTA); - args->flags &= ~(XFSMNT_GQUOTAENF|XFSMNT_GQUOTA); - } else if (!strcmp(this_char, MNTOPT_QUOTA) || - !strcmp(this_char, MNTOPT_UQUOTA) || - !strcmp(this_char, MNTOPT_USRQUOTA)) { - args->flags |= XFSMNT_UQUOTA | XFSMNT_UQUOTAENF; - } else if (!strcmp(this_char, MNTOPT_QUOTANOENF) || - !strcmp(this_char, MNTOPT_UQUOTANOENF)) { - args->flags |= XFSMNT_UQUOTA; - args->flags &= ~XFSMNT_UQUOTAENF; - } else if (!strcmp(this_char, MNTOPT_PQUOTA) || - !strcmp(this_char, MNTOPT_PRJQUOTA)) { - args->flags |= XFSMNT_PQUOTA | XFSMNT_PQUOTAENF; - } else if (!strcmp(this_char, MNTOPT_PQUOTANOENF)) { - args->flags |= XFSMNT_PQUOTA; - args->flags &= ~XFSMNT_PQUOTAENF; - } else if (!strcmp(this_char, MNTOPT_GQUOTA) || - !strcmp(this_char, MNTOPT_GRPQUOTA)) { - args->flags |= XFSMNT_GQUOTA | XFSMNT_GQUOTAENF; - } else if (!strcmp(this_char, MNTOPT_GQUOTANOENF)) { - args->flags |= XFSMNT_GQUOTA; - args->flags &= ~XFSMNT_GQUOTAENF; - } else if (!strcmp(this_char, MNTOPT_DMAPI)) { - args->flags |= XFSMNT_DMAPI; - } else if (!strcmp(this_char, MNTOPT_XDSM)) { - args->flags |= XFSMNT_DMAPI; - } else if (!strcmp(this_char, MNTOPT_DMI)) { - args->flags |= XFSMNT_DMAPI; - } else if (!strcmp(this_char, "ihashsize")) { - cmn_err(CE_WARN, - "XFS: ihashsize no longer used, option is deprecated."); - } else if (!strcmp(this_char, "osyncisdsync")) { - /* no-op, this is now the default */ - cmn_err(CE_WARN, - "XFS: osyncisdsync is now the default, option is deprecated."); - } else if (!strcmp(this_char, "irixsgid")) { - cmn_err(CE_WARN, - "XFS: irixsgid is now a sysctl(2) variable, option is deprecated."); - } else { - cmn_err(CE_WARN, - "XFS: unknown mount option [%s].", this_char); - return EINVAL; - } - } - - if (args->flags & XFSMNT_NORECOVERY) { - if ((mp->m_flags & XFS_MOUNT_RDONLY) == 0) { - cmn_err(CE_WARN, - "XFS: no-recovery mounts must be read-only."); - return EINVAL; - } - } - - if ((args->flags & XFSMNT_NOALIGN) && (dsunit || dswidth)) { - cmn_err(CE_WARN, - "XFS: sunit and swidth options incompatible with the noalign option"); - return EINVAL; - } - - if ((args->flags & XFSMNT_GQUOTA) && (args->flags & XFSMNT_PQUOTA)) { - cmn_err(CE_WARN, - "XFS: cannot mount with both project and group quota"); - return EINVAL; - } - - if ((args->flags & XFSMNT_DMAPI) && *args->mtpt == '\0') { - printk("XFS: %s option needs the mount point option as well\n", - MNTOPT_DMAPI); - return EINVAL; - } - - if ((dsunit && !dswidth) || (!dsunit && dswidth)) { - cmn_err(CE_WARN, - "XFS: sunit and swidth must be specified together"); - return EINVAL; - } - - if (dsunit && (dswidth % dsunit != 0)) { - cmn_err(CE_WARN, - "XFS: stripe width (%d) must be a multiple of the stripe unit (%d)", - dswidth, dsunit); - return EINVAL; - } - - /* - * Applications using DMI filesystems often expect the - * inode generation number to be monotonically increasing. - * If we delete inode chunks we break this assumption, so - * keep unused inode chunks on disk for DMI filesystems - * until we come up with a better solution. - * Note that if "ikeep" or "noikeep" mount options are - * supplied, then they are honored. - */ - if (!(args->flags & XFSMNT_DMAPI) && !ikeep) - args->flags |= XFSMNT_IDELETE; - - if ((args->flags & XFSMNT_NOALIGN) != XFSMNT_NOALIGN) { - if (dsunit) { - args->sunit = dsunit; - args->flags |= XFSMNT_RETERR; - } else { - args->sunit = vol_dsunit; - } - dswidth ? (args->swidth = dswidth) : - (args->swidth = vol_dswidth); - } else { - args->sunit = args->swidth = 0; - } - -done: - if (args->flags & XFSMNT_32BITINODES) - mp->m_flags |= XFS_MOUNT_SMALL_INUMS; - if (args->flags2) - args->flags |= XFSMNT_FLAGS2; - return 0; -} - -struct proc_xfs_info { - int flag; - char *str; -}; - -int -xfs_showargs( - struct xfs_mount *mp, - struct seq_file *m) -{ - static struct proc_xfs_info xfs_info_set[] = { - /* the few simple ones we can get from the mount struct */ - { XFS_MOUNT_WSYNC, "," MNTOPT_WSYNC }, - { XFS_MOUNT_INO64, "," MNTOPT_INO64 }, - { XFS_MOUNT_NOALIGN, "," MNTOPT_NOALIGN }, - { XFS_MOUNT_SWALLOC, "," MNTOPT_SWALLOC }, - { XFS_MOUNT_NOUUID, "," MNTOPT_NOUUID }, - { XFS_MOUNT_NORECOVERY, "," MNTOPT_NORECOVERY }, - { XFS_MOUNT_OSYNCISOSYNC, "," MNTOPT_OSYNCISOSYNC }, - { XFS_MOUNT_ATTR2, "," MNTOPT_ATTR2 }, - { XFS_MOUNT_FILESTREAMS, "," MNTOPT_FILESTREAM }, - { XFS_MOUNT_DMAPI, "," MNTOPT_DMAPI }, - { XFS_MOUNT_GRPID, "," MNTOPT_GRPID }, - { 0, NULL } - }; - static struct proc_xfs_info xfs_info_unset[] = { - /* the few simple ones we can get from the mount struct */ - { XFS_MOUNT_IDELETE, "," MNTOPT_IKEEP }, - { XFS_MOUNT_COMPAT_IOSIZE, "," MNTOPT_LARGEIO }, - { XFS_MOUNT_BARRIER, "," MNTOPT_NOBARRIER }, - { XFS_MOUNT_SMALL_INUMS, "," MNTOPT_64BITINODE }, - { 0, NULL } - }; - struct proc_xfs_info *xfs_infop; - - for (xfs_infop = xfs_info_set; xfs_infop->flag; xfs_infop++) { - if (mp->m_flags & xfs_infop->flag) - seq_puts(m, xfs_infop->str); - } - for (xfs_infop = xfs_info_unset; xfs_infop->flag; xfs_infop++) { - if (!(mp->m_flags & xfs_infop->flag)) - seq_puts(m, xfs_infop->str); - } - - if (mp->m_flags & XFS_MOUNT_DFLT_IOSIZE) - seq_printf(m, "," MNTOPT_ALLOCSIZE "=%dk", - (int)(1 << mp->m_writeio_log) >> 10); - - if (mp->m_logbufs > 0) - seq_printf(m, "," MNTOPT_LOGBUFS "=%d", mp->m_logbufs); - if (mp->m_logbsize > 0) - seq_printf(m, "," MNTOPT_LOGBSIZE "=%dk", mp->m_logbsize >> 10); - - if (mp->m_logname) - seq_printf(m, "," MNTOPT_LOGDEV "=%s", mp->m_logname); - if (mp->m_rtname) - seq_printf(m, "," MNTOPT_RTDEV "=%s", mp->m_rtname); - - if (mp->m_dalign > 0) - seq_printf(m, "," MNTOPT_SUNIT "=%d", - (int)XFS_FSB_TO_BB(mp, mp->m_dalign)); - if (mp->m_swidth > 0) - seq_printf(m, "," MNTOPT_SWIDTH "=%d", - (int)XFS_FSB_TO_BB(mp, mp->m_swidth)); - - if (mp->m_qflags & (XFS_UQUOTA_ACCT|XFS_UQUOTA_ENFD)) - seq_puts(m, "," MNTOPT_USRQUOTA); - else if (mp->m_qflags & XFS_UQUOTA_ACCT) - seq_puts(m, "," MNTOPT_UQUOTANOENF); - - if (mp->m_qflags & (XFS_PQUOTA_ACCT|XFS_OQUOTA_ENFD)) - seq_puts(m, "," MNTOPT_PRJQUOTA); - else if (mp->m_qflags & XFS_PQUOTA_ACCT) - seq_puts(m, "," MNTOPT_PQUOTANOENF); - - if (mp->m_qflags & (XFS_GQUOTA_ACCT|XFS_OQUOTA_ENFD)) - seq_puts(m, "," MNTOPT_GRPQUOTA); - else if (mp->m_qflags & XFS_GQUOTA_ACCT) - seq_puts(m, "," MNTOPT_GQUOTANOENF); - - if (!(mp->m_qflags & XFS_ALL_QUOTA_ACCT)) - seq_puts(m, "," MNTOPT_NOQUOTA); - - return 0; -} Index: 2.6.x-xfs-new/fs/xfs/xfs_vfsops.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_vfsops.h 2007-10-02 16:01:48.000000000 +1000 +++ 2.6.x-xfs-new/fs/xfs/xfs_vfsops.h 2007-10-31 10:24:35.088295625 +1100 @@ -16,9 +16,6 @@ int xfs_mntupdate(struct xfs_mount *mp, int xfs_root(struct xfs_mount *mp, bhv_vnode_t **vpp); int xfs_sync(struct xfs_mount *mp, int flags); int xfs_vget(struct xfs_mount *mp, bhv_vnode_t **vpp, struct xfs_fid *xfid); -int xfs_parseargs(struct xfs_mount *mp, char *options, - struct xfs_mount_args *args, int update); -int xfs_showargs(struct xfs_mount *mp, struct seq_file *m); void xfs_do_force_shutdown(struct xfs_mount *mp, int flags, char *fname, int lnnum); void xfs_attr_quiesce(struct xfs_mount *mp); From owner-xfs@oss.sgi.com Thu Nov 1 18:51:54 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 01 Nov 2007 18:51:58 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA21pns7011676 for ; Thu, 1 Nov 2007 18:51:53 -0700 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA04058; Fri, 2 Nov 2007 12:51:50 +1100 Received: by chook.melbourne.sgi.com (Postfix, from userid 16346) id 22C5658C38F7; Fri, 2 Nov 2007 12:51:50 +1100 (EST) To: sgi.bugs.xfs@engr.sgi.com Cc: xfs@oss.sgi.com Subject: PARTIAL TAKE 971186 - Show all mount args in /proc/mounts Message-Id: <20071102015150.22C5658C38F7@chook.melbourne.sgi.com> Date: Fri, 2 Nov 2007 12:51:50 +1100 (EST) From: dgc@sgi.com (David Chinner) X-Virus-Scanned: ClamAV 0.91.2/4659/Thu Nov 1 09:24:40 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13524 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Show all mount args in /proc/mounts There are several mount options that don't show up in /proc/mounts. Add them in and clean up the showargs code at the same time. Date: Fri Nov 2 12:51:17 AEDT 2007 Workarea: chook.melbourne.sgi.com:/build/dgc/isms/2.6.x-xfs Inspected by: hch@infradead.org The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb Modid: xfs-linux-melb:xfs-kern:30004a fs/xfs/xfs_vfsops.c - 1.547 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_vfsops.c.diff?r1=text&tr1=1.547&r2=text&tr2=1.546&f=h - Show all mount args in /proc/mounts. From owner-xfs@oss.sgi.com Thu Nov 1 18:57:01 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 01 Nov 2007 18:57:04 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA21uuWM012579 for ; Thu, 1 Nov 2007 18:56:59 -0700 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA04290; Fri, 2 Nov 2007 12:56:56 +1100 Received: by chook.melbourne.sgi.com (Postfix, from userid 16346) id B31AC58C38F7; Fri, 2 Nov 2007 12:56:56 +1100 (EST) To: sgi.bugs.xfs@engr.sgi.com Cc: xfs@oss.sgi.com Subject: TAKE 972757 - Fix transaction overrun during writeback. Message-Id: <20071102015656.B31AC58C38F7@chook.melbourne.sgi.com> Date: Fri, 2 Nov 2007 12:56:56 +1100 (EST) From: dgc@sgi.com (David Chinner) X-Virus-Scanned: ClamAV 0.91.2/4659/Thu Nov 1 09:24:40 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13525 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Fix transaction overrun during writeback. Prevent transaction overrun in xfs_iomap_write_allocate() if we race with a truncate that overlaps the delalloc range we were planning to allocate. If we race, we may allocate into a hole and that requires block allocation. At this point in time we don't have a reservation for block allocation (apart from metadata blocks) and so allocating into a hole rather than a delalloc region results in overflowing the transaction block reservation. Fix it by only allowing a single extent to be allocated at a time. Date: Fri Nov 2 12:56:36 AEDT 2007 Workarea: chook.melbourne.sgi.com:/build/dgc/isms/2.6.x-xfs Inspected by: lachlan@sgi.com The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb Modid: xfs-linux-melb:xfs-kern:30005a fs/xfs/xfs_iomap.c - 1.60 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_iomap.c.diff?r1=text&tr1=1.60&r2=text&tr2=1.59&f=h - Only allow xfs_iomap_write_allocate to allocate a single extent at a time to prevent races with truncate from causing unreserved allocation and hence transaction overruns. From owner-xfs@oss.sgi.com Thu Nov 1 19:09:30 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 01 Nov 2007 19:09:33 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from sandeen.net (sandeen.net [209.173.210.139]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA229SWT014054 for ; Thu, 1 Nov 2007 19:09:30 -0700 Received: from macmini.sandeen.net (macmini.sandeen.net [10.0.0.61]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by sandeen.net (Postfix) with ESMTP id B3AC418004FD9; Thu, 1 Nov 2007 21:09:31 -0500 (CDT) Message-ID: <472A87FA.7000804@sandeen.net> Date: Thu, 01 Nov 2007 21:14:18 -0500 From: Eric Sandeen User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Jay Sullivan CC: xfs@oss.sgi.com Subject: Re: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c References: <06CCEA2EB1B80A4A937ED59005FA855101AED1BE@svits26.main.ad.rit.edu> In-Reply-To: <06CCEA2EB1B80A4A937ED59005FA855101AED1BE@svits26.main.ad.rit.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4659/Thu Nov 1 09:24:40 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13526 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: sandeen@sandeen.net Precedence: bulk X-list: xfs Jay Sullivan wrote: > I ran xfs_repair -L on the FS and it could be mounted again, Was it not even mountable before this, or why did you use the -L flag? If the log is corrupted that points to more problems... perhaps you've had some power loss & your write caches evaporated, and lvm doesn't do barriers? -eric From owner-xfs@oss.sgi.com Thu Nov 1 19:22:54 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 01 Nov 2007 19:22:57 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: *** X-Spam-Status: No, score=3.0 required=5.0 tests=BAYES_50,HTML_MESSAGE autolearn=no version=3.3.0-r574664 Received: from sc3app27.rit.edu (sc3app27.rit.edu [129.21.35.56]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA22MpxU016149 for ; Thu, 1 Nov 2007 19:22:54 -0700 Received: from cias-jpspgd-macbook.jayps.home (cpe-72-230-182-205.rochester.res.rr.com [72.230.182.205]) by smtp-server.rit.edu (PMDF V6.3-x14 #31420) with ESMTPSA id <0JQU00675XA7QD@smtp-server.rit.edu> for xfs@oss.sgi.com; Thu, 01 Nov 2007 22:22:56 -0400 (EDT) Date: Thu, 01 Nov 2007 22:22:54 -0400 From: Jay Sullivan Subject: Re: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c In-reply-to: <472A87FA.7000804@sandeen.net> To: xfs@oss.sgi.com Message-id: <9489F071-7966-4230-9DAC-D783B6B9600A@rit.edu> MIME-version: 1.0 X-Mailer: Apple Mail (2.912) X-RIT-Received-From: 72.230.182.205 jpspgd@smtp-server.rit.edu References: <06CCEA2EB1B80A4A937ED59005FA855101AED1BE@svits26.main.ad.rit.edu> <472A87FA.7000804@sandeen.net> X-Virus-Scanned: ClamAV 0.91.2/4659/Thu Nov 1 09:24:40 2007 on oss.sgi.com X-Virus-Status: Clean Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: 7bit Content-length: 659 X-archive-position: 13527 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jpspgd@rit.edu Precedence: bulk X-list: xfs Good eye: it wasn't mountable, thus the -L flag. No recent (unplanned) power outages. The machine and the array that holds the disks are both on serious batteries/UPS and the array's cache batteries are in good health. ~Jay On Nov 1, 2007, at 10:14 PM, Eric Sandeen wrote: > Jay Sullivan wrote: > > > I ran xfs_repair -L on the FS and it could be mounted again, > > Was it not even mountable before this, or why did you use the -L flag? > If the log is corrupted that points to more problems... perhaps you've > had some power loss & your write caches evaporated, and lvm doesn't do > barriers? > > -eric > > [[HTML alternate version deleted]] From owner-xfs@oss.sgi.com Thu Nov 1 19:30:16 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 01 Nov 2007 19:30:18 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from sandeen.net (sandeen.net [209.173.210.139]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA22UDO9017576 for ; Thu, 1 Nov 2007 19:30:15 -0700 Received: from Liberator.local (liberator.sandeen.net [10.0.0.4]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by sandeen.net (Postfix) with ESMTP id F1DFB187B8C0F; Thu, 1 Nov 2007 21:30:17 -0500 (CDT) Message-ID: <472A8BB9.7040100@sandeen.net> Date: Thu, 01 Nov 2007 21:30:17 -0500 From: Eric Sandeen User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Jay Sullivan CC: xfs@oss.sgi.com Subject: Re: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c References: <06CCEA2EB1B80A4A937ED59005FA855101AED1BE@svits26.main.ad.rit.edu> <472A87FA.7000804@sandeen.net> <9489F071-7966-4230-9DAC-D783B6B9600A@rit.edu> In-Reply-To: <9489F071-7966-4230-9DAC-D783B6B9600A@rit.edu> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4659/Thu Nov 1 09:24:40 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13528 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: sandeen@sandeen.net Precedence: bulk X-list: xfs Jay Sullivan wrote: > Good eye: it wasn't mountable, thus the -L flag. No recent > (unplanned) power outages. The machine and the array that holds the > disks are both on serious batteries/UPS and the array's cache > batteries are in good health. Did you have the xfs_repair output to see what it found? You might also grab the very latest xfsprogs (2.9.4) in case it's catching more cases. I hate it when people suggest running memtest86, but I might do that anyway. :) What controller are you using? If you say "areca" I might be on to something with some other bugs I've seen... -Eric From owner-xfs@oss.sgi.com Thu Nov 1 19:35:51 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 01 Nov 2007 19:35:54 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA22Zmhd018421 for ; Thu, 1 Nov 2007 19:35:50 -0700 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA05273; Fri, 2 Nov 2007 13:35:48 +1100 Received: by chook.melbourne.sgi.com (Postfix, from userid 16346) id E27C658C38F7; Fri, 2 Nov 2007 13:35:47 +1100 (EST) To: sgi.bugs.xfs@engr.sgi.com Cc: xfs@oss.sgi.com Subject: TAKE 972753 - Fix inode allocation latency Message-Id: <20071102023547.E27C658C38F7@chook.melbourne.sgi.com> Date: Fri, 2 Nov 2007 13:35:47 +1100 (EST) From: dgc@sgi.com (David Chinner) X-Virus-Scanned: ClamAV 0.91.2/4659/Thu Nov 1 09:24:40 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13529 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Fix inode allocation latency The log force added in xfs_iget_core() has been a performance issue since it was introduced for tight loops that allocate then unlink a single file. under heavy writeback, this can introduce unnecessary latency due tothe log I/o getting stuck behind bulk data writes. Fix this latency problem by avoinding the need for the log force by moving the place we mark linux inode dirty to the transaction commit rather than on transaction completion. This also closes a potential hole in the sync code where a linux inode is not dirty between the time it is modified and the time the log buffer has been written to disk. Date: Fri Nov 2 13:35:27 AEDT 2007 Workarea: chook.melbourne.sgi.com:/build/dgc/isms/2.6.x-xfs Inspected by: hch@infradead.org The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb Modid: xfs-linux-melb:xfs-kern:30007a fs/xfs/xfs_inode_item.c - 1.132 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_inode_item.c.diff?r1=text&tr1=1.132&r2=text&tr2=1.131&f=h - Remove the need to mark the linux inode dirty in xfs_iunpin by marking it dirty during transaction commit. fs/xfs/xfs_iget.c - 1.237 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_iget.c.diff?r1=text&tr1=1.237&r2=text&tr2=1.236&f=h - Remove the need to force the log on pinned inode reuse by making sure we never need to touch the linux inode during transaction completion. fs/xfs/xfs_inode.c - 1.485 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_inode.c.diff?r1=text&tr1=1.485&r2=text&tr2=1.484&f=h - Remove the need to mark the linux inode dirty in xfs_iunpin by marking it dirty during transaction commit. fs/xfs/xfs_inode.h - 1.237 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_inode.h.diff?r1=text&tr1=1.237&r2=text&tr2=1.236&f=h - Remove the need to mark the linux inode dirty in xfs_iunpin by marking it dirty during transaction commit. fs/xfs/linux-2.6/xfs_iops.c - 1.267 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_iops.c.diff?r1=text&tr1=1.267&r2=text&tr2=1.266&f=h - Remove the need to mark the linux inode dirty in xfs_iunpin by marking it dirty during transaction commit. From owner-xfs@oss.sgi.com Thu Nov 1 19:43:20 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 01 Nov 2007 19:43:24 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA22hFsG019415 for ; Thu, 1 Nov 2007 19:43:19 -0700 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA05472; Fri, 2 Nov 2007 13:43:14 +1100 Received: by chook.melbourne.sgi.com (Postfix, from userid 16346) id 9BF3458C38F7; Fri, 2 Nov 2007 13:43:14 +1100 (EST) To: sgi.bugs.xfs@engr.sgi.com Cc: xfs@oss.sgi.com Subject: TAKE 972756 - Implement fallocate. Message-Id: <20071102024314.9BF3458C38F7@chook.melbourne.sgi.com> Date: Fri, 2 Nov 2007 13:43:14 +1100 (EST) From: dgc@sgi.com (David Chinner) X-Virus-Scanned: ClamAV 0.91.2/4659/Thu Nov 1 09:24:40 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13530 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Implement fallocate. Implement the new generic callout for file preallocation. Atomically change the file size if requested. Date: Fri Nov 2 13:42:52 AEDT 2007 Workarea: chook.melbourne.sgi.com:/build/dgc/isms/2.6.x-xfs Inspected by: hch@infradead.org The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb Modid: xfs-linux-melb:xfs-kern:30009a fs/xfs/linux-2.6/xfs_iops.c - 1.268 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_iops.c.diff?r1=text&tr1=1.268&r2=text&tr2=1.267&f=h - implement ->fallocate() From owner-xfs@oss.sgi.com Thu Nov 1 20:00:19 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 01 Nov 2007 20:00:22 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: *** X-Spam-Status: No, score=3.0 required=5.0 tests=BAYES_50,HTML_MESSAGE autolearn=no version=3.3.0-r574664 Received: from sc3app27.rit.edu (sc3app27.rit.edu [129.21.35.56]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA230Ibt021435 for ; Thu, 1 Nov 2007 20:00:19 -0700 Received: from cias-jpspgd-macbook.jayps.home (cpe-72-230-182-205.rochester.res.rr.com [72.230.182.205]) by smtp-server.rit.edu (PMDF V6.3-x14 #31420) with ESMTPSA id <0JQU00IC1WLNI4@smtp-server.rit.edu> for xfs@oss.sgi.com; Thu, 01 Nov 2007 22:08:13 -0400 (EDT) Date: Thu, 01 Nov 2007 22:08:09 -0400 From: Jay Sullivan Subject: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c To: xfs@oss.sgi.com Message-id: <3A2120EF-3EB0-4CF1-8C4E-920B9688D51F@rit.edu> MIME-version: 1.0 X-Mailer: Apple Mail (2.912) X-RIT-Received-From: 72.230.182.205 jpspgd@smtp-server.rit.edu X-Virus-Scanned: ClamAV 0.91.2/4659/Thu Nov 1 09:24:40 2007 on oss.sgi.com X-Virus-Status: Clean Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-length: 3288 X-archive-position: 13531 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jpspgd@rit.edu Precedence: bulk X-list: xfs (Sorry if this is a dupe to the list; it has been a long day.) I have an XFS filesystem that has had the following happen twice in 3=20=20 months, both times an impossibly large block number was requested.=20=20=20 Unfortunately my logs don=92t go back far enough for me to know if it=20=20 was the _exact_ same block both times=85 I=92m running xfsprogs 2.8.21.=20= =20=20 Excerpt from syslog (hostname obfuscated to =91servername=92 to protect=20= =20 the innocent): ## Nov 1 14:06:32 servername dm-1: rw=3D0, want=3D39943195856896,=20=20 limit=3D7759462400 Nov 1 14:06:32 servername I/O error in filesystem ("dm-1") meta-data=20=20 dev dm-1 block 0x245400000ff8 ("xfs_trans_read_buf") error 5 buf=20= =20 count 4096 Nov 1 14:06:32 servername xfs_force_shutdown(dm-1,0x1) called from=20=20 line 415 of file fs/xfs/xfs_trans_buf.c. Return address =3D 0xc02baa25 Nov 1 14:06:32 servername Filesystem "dm-1": I/O Error Detected.=20=20=20 Shutting down filesystem: dm-1 Nov 1 14:06:32 servername Please umount the filesystem, and rectify=20=20 the problem(s) ### I ran xfs_repair =96L on the FS and it could be mounted again, but how=20= =20 long until it happens a third time? What concerns me is that this is=20=20 a FS smaller than 4TB and 39943195856896 (or 0x245400000ff8) seems=20=20 like a block that I would only have if my FS was muuuuuch larger. The=20= =20 following is output from some pertinent programs: ### servername ~ # xfs_info /mnt/san meta-data=3D/dev/servername-sanvg01/servername-sanlv01 isize=3D256=20=20=20= =20=20 agcount=3D5, agsize=3D203161600 blks =3D sectsz=3D512 attr=3D2 data =3D bsize=3D4096 blocks=3D969932800,=20=20 imaxpct=3D25 =3D sunit=3D0 swidth=3D0 blks,=20=20 unwritten=3D1 naming =3Dversion 2 bsize=3D4096 log =3Dinternal bsize=3D4096 blocks=3D32768, version= =3D1 =3D sectsz=3D512 sunit=3D0 blks, lazy-=20 count=3D0 realtime =3Dnone extsz=3D4096 blocks=3D0, rtextents=3D0 servername ~ # mount /dev/sda3 on / type ext3 (rw,noatime,acl) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw,nosuid,nodev,noexec) udev on /dev type tmpfs (rw,nosuid) devpts on /dev/pts type devpts (rw,nosuid,noexec) shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev) usbfs on /proc/bus/usb type usbfs=20=20 (rw,noexec,nosuid,devmode=3D0664,devgid=3D85) binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc=20=20 (rw,noexec,nosuid,nodev) nfsd on /proc/fs/nfsd type nfsd (rw) /dev/mapper/servername--sanvg01-servername--sanlv01 on /mnt/san type=20=20 xfs (rw,noatime,nodiratime,logbufs=3D8,attr2) /dev/mapper/servername--sanvg01-servername--rendersharelv01 on /mnt/=20 san/rendershare type xfs (rw,noatime,nodiratime,logbufs=3D8,attr2) rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) servername ~ # uname -a Linux servername 2.6.20-gentoo-r8 #7 SMP Fri Jun 29 14:46:02 EDT 2007=20=20 i686 Intel(R) Xeon(TM) CPU 3.20GHz GenuineIntel GNU/Linux ### Does anyone know if this points to a bad block on a disk or if=20=20 something is corrupted and can be fixed with some expert knowledge of=20=20 xfs_db? ~Jay [[HTML alternate version deleted]] From owner-xfs@oss.sgi.com Thu Nov 1 21:37:40 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 01 Nov 2007 21:37:45 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA24bcr0004066 for ; Thu, 1 Nov 2007 21:37:39 -0700 Received: from timothy-shimmins-power-mac-g5.local (boing.melbourne.sgi.com [134.14.55.141]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA07528; Fri, 2 Nov 2007 15:37:34 +1100 Message-ID: <472AA999.6090900@sgi.com> Date: Fri, 02 Nov 2007 15:37:45 +1100 From: Timothy Shimmin User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Eric Sandeen CC: Jay Sullivan , xfs@oss.sgi.com Subject: Re: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c References: <06CCEA2EB1B80A4A937ED59005FA855101AED1BE@svits26.main.ad.rit.edu> <472A87FA.7000804@sandeen.net> In-Reply-To: <472A87FA.7000804@sandeen.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4659/Thu Nov 1 09:24:40 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13532 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: tes@sgi.com Precedence: bulk X-list: xfs Eric Sandeen wrote: > Jay Sullivan wrote: > >> I ran xfs_repair -L on the FS and it could be mounted again, > > Was it not even mountable before this, or why did you use the -L flag? > If the log is corrupted that points to more problems... perhaps you've > had some power loss & your write caches evaporated, and lvm doesn't do > barriers? > > -eric > BTW, I occasionally wonder about the reason for log corruptions. If we have an "evaporated" write cache that would stop a write from going but it wouldn't do a partial sector (< 512 byte) write, would it? I have presumed that sector writes complete or not and that is what the log code is based on. OOI, Jay, how did it fail to mount - what was the log msg? I presume you couldn't mount such that even the log couldn't be replayed? Did it fail during replay? --Tim From owner-xfs@oss.sgi.com Thu Nov 1 22:18:42 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 01 Nov 2007 22:18:50 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA25IdK9012755 for ; Thu, 1 Nov 2007 22:18:41 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA08548; Fri, 2 Nov 2007 16:18:38 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lA25IbdD90702948; Fri, 2 Nov 2007 16:18:37 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lA25IYPj90442928; Fri, 2 Nov 2007 16:18:34 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Fri, 2 Nov 2007 16:18:34 +1100 From: David Chinner To: Jay Sullivan Cc: xfs@oss.sgi.com Subject: Re: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c Message-ID: <20071102051834.GM995458@sgi.com> References: <3A2120EF-3EB0-4CF1-8C4E-920B9688D51F@rit.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <3A2120EF-3EB0-4CF1-8C4E-920B9688D51F@rit.edu> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4659/Thu Nov 1 09:24:40 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13533 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Thu, Nov 01, 2007 at 10:08:09PM -0400, Jay Sullivan wrote: > (Sorry if this is a dupe to the list; it has been a long day.) > > I have an XFS filesystem that has had the following happen twice in 3 > months, both times an impossibly large block number was requested. .... Sure sign of a corrupted btree. > I ran xfs_repair –L on the FS and it could be mounted again, but how > long until it happens a third time? What was the problem that xfs_repair fixed? BTW, why did you run xfs_repair -L? Also, when it happens next, what does xfs_check tell you is broken? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Fri Nov 2 02:23:19 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 02 Nov 2007 02:23:29 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from stlx01.stz-softwaretechnik.com (stz-softwaretechnik.de [217.160.223.211]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA29NFqC025821 for ; Fri, 2 Nov 2007 02:23:19 -0700 Received: from rg by stlx01.stz-softwaretechnik.com with local (Exim 3.36 #1 (Debian)) id 1InsVK-0005Cb-00 for ; Fri, 02 Nov 2007 10:07:50 +0100 Date: Fri, 2 Nov 2007 10:07:40 +0100 From: Ralf Gross To: xfs@oss.sgi.com Subject: Re: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c Message-ID: <20071102090740.GB23263@p15145560.pureserver.info> References: <06CCEA2EB1B80A4A937ED59005FA855101AED1BE@svits26.main.ad.rit.edu> <472A87FA.7000804@sandeen.net> <9489F071-7966-4230-9DAC-D783B6B9600A@rit.edu> <472A8BB9.7040100@sandeen.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <472A8BB9.7040100@sandeen.net> User-Agent: Mutt/1.5.9i X-Virus-Scanned: ClamAV 0.91.2/4659/Thu Nov 1 09:24:40 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13534 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: Ralf-Lists@ralfgross.de Precedence: bulk X-list: xfs Eric Sandeen schrieb: > ... > What controller are you using? If you say "areca" I might be on to > something with some other bugs I've seen... I use areca controllers with xfs, but had no problems yet. Can you explain what bugs might hit me? Ralf From owner-xfs@oss.sgi.com Fri Nov 2 07:00:04 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 02 Nov 2007 07:00:08 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from SVITS26.main.ad.rit.edu (svits26.main.ad.rit.edu [129.21.18.136]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA2E01B5002655 for ; Fri, 2 Nov 2007 07:00:04 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Subject: RE: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c Date: Fri, 2 Nov 2007 10:00:23 -0400 Message-ID: <06CCEA2EB1B80A4A937ED59005FA855101AED204@svits26.main.ad.rit.edu> In-Reply-To: <472A8BB9.7040100@sandeen.net> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c Thread-Index: Acgc+HCNQCFWOWFQTvaSJTeWO28rWgAWfz9Q From: "Jay Sullivan" To: X-Virus-Scanned: ClamAV 0.91.2/4660/Fri Nov 2 05:13:54 2007 on oss.sgi.com X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id lA2E04B5002696 X-archive-position: 13535 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jpspgd@rit.edu Precedence: bulk X-list: xfs I lost the xfs_repair output on an xterm with only four lines of scrollback... I'll definitely be more careful to preserve more 'evidence' next time. =( "Pics or it didn't happen", right? I just upgraded xfsprogs and will scan the disk during my next scheduled downtime (probably in about 2 weeks). I'm tempted to just wipe the volume and start over: I have enough 'spare' space lying around to copy everything out to a fresh XFS volume. Regarding "areca": I'm using hardware RAID built into Apple XServe RAIDs o'er LSI FC929X cards. Someone else offered the likely explanation that the btree is corrupted. Isn't this something xfs_repair should be able to fix? Would it be easier, safer, and faster to move the data to a new volume (and restore corrupted files if/as I find them from backup)? We're talking about just less than 4TB of data which used to take about 6 hours to fsck (one pass) with ext3. Restoring the whole shebang from backups would probably take the better part of 12 years (waiting for compression, resetting ACLs, etc.)... FWIW, another (way less important,) much busier and significantly larger logical volume on the same array has been totally fine. Murphy--go figure. Thanks! -----Original Message----- From: Eric Sandeen [mailto:sandeen@sandeen.net] Sent: Thursday, November 01, 2007 10:30 PM To: Jay Sullivan Cc: xfs@oss.sgi.com Subject: Re: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c Jay Sullivan wrote: > Good eye: it wasn't mountable, thus the -L flag. No recent > (unplanned) power outages. The machine and the array that holds the > disks are both on serious batteries/UPS and the array's cache > batteries are in good health. Did you have the xfs_repair output to see what it found? You might also grab the very latest xfsprogs (2.9.4) in case it's catching more cases. I hate it when people suggest running memtest86, but I might do that anyway. :) What controller are you using? If you say "areca" I might be on to something with some other bugs I've seen... -Eric From owner-xfs@oss.sgi.com Fri Nov 2 07:48:57 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 02 Nov 2007 07:49:01 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-0.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from SVITS26.main.ad.rit.edu (svits26.main.ad.rit.edu [129.21.18.136]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA2Emso4008131 for ; Fri, 2 Nov 2007 07:48:57 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Subject: RE: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c Date: Fri, 2 Nov 2007 10:49:16 -0400 Message-ID: <06CCEA2EB1B80A4A937ED59005FA855101AED213@svits26.main.ad.rit.edu> In-Reply-To: <06CCEA2EB1B80A4A937ED59005FA855101AED204@svits26.main.ad.rit.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c Thread-Index: Acgc+HCNQCFWOWFQTvaSJTeWO28rWgAWfz9QAAMsAYA= From: "Jay Sullivan" To: X-Virus-Scanned: ClamAV 0.91.2/4661/Fri Nov 2 06:48:31 2007 on oss.sgi.com X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id lA2Emvo4008137 X-archive-position: 13536 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jpspgd@rit.edu Precedence: bulk X-list: xfs What can I say about Murphy and his silly laws? I just had a drive fail on my array. I wonder if this is the root of my problems... Yay parity. ~Jay -----Original Message----- From: xfs-bounce@oss.sgi.com [mailto:xfs-bounce@oss.sgi.com] On Behalf Of Jay Sullivan Sent: Friday, November 02, 2007 10:00 AM To: xfs@oss.sgi.com Subject: RE: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c I lost the xfs_repair output on an xterm with only four lines of scrollback... I'll definitely be more careful to preserve more 'evidence' next time. =( "Pics or it didn't happen", right? I just upgraded xfsprogs and will scan the disk during my next scheduled downtime (probably in about 2 weeks). I'm tempted to just wipe the volume and start over: I have enough 'spare' space lying around to copy everything out to a fresh XFS volume. Regarding "areca": I'm using hardware RAID built into Apple XServe RAIDs o'er LSI FC929X cards. Someone else offered the likely explanation that the btree is corrupted. Isn't this something xfs_repair should be able to fix? Would it be easier, safer, and faster to move the data to a new volume (and restore corrupted files if/as I find them from backup)? We're talking about just less than 4TB of data which used to take about 6 hours to fsck (one pass) with ext3. Restoring the whole shebang from backups would probably take the better part of 12 years (waiting for compression, resetting ACLs, etc.)... FWIW, another (way less important,) much busier and significantly larger logical volume on the same array has been totally fine. Murphy--go figure. Thanks! -----Original Message----- From: Eric Sandeen [mailto:sandeen@sandeen.net] Sent: Thursday, November 01, 2007 10:30 PM To: Jay Sullivan Cc: xfs@oss.sgi.com Subject: Re: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c Jay Sullivan wrote: > Good eye: it wasn't mountable, thus the -L flag. No recent > (unplanned) power outages. The machine and the array that holds the > disks are both on serious batteries/UPS and the array's cache > batteries are in good health. Did you have the xfs_repair output to see what it found? You might also grab the very latest xfsprogs (2.9.4) in case it's catching more cases. I hate it when people suggest running memtest86, but I might do that anyway. :) What controller are you using? If you say "areca" I might be on to something with some other bugs I've seen... -Eric From owner-xfs@oss.sgi.com Fri Nov 2 09:10:36 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 02 Nov 2007 09:10:39 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-4.6 required=5.0 tests=AWL,BAYES_00, RCVD_IN_DNSWL_MED,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from mx1.redhat.com (mx1.redhat.com [66.187.233.31]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA2GAWvv021419 for ; Fri, 2 Nov 2007 09:10:36 -0700 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.13.8/8.13.1) with ESMTP id lA2GAawa023026; Fri, 2 Nov 2007 12:10:37 -0400 Received: from lacrosse.corp.redhat.com (lacrosse.corp.redhat.com [172.16.52.154]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id lA2GAa6B023757; Fri, 2 Nov 2007 12:10:36 -0400 Received: from [10.15.80.10] (neon.msp.redhat.com [10.15.80.10]) by lacrosse.corp.redhat.com (8.12.11.20060308/8.11.6) with ESMTP id lA2GAXtK017600; Fri, 2 Nov 2007 12:10:35 -0400 Message-ID: <472B4BF8.2040500@sandeen.net> Date: Fri, 02 Nov 2007 11:10:32 -0500 From: Eric Sandeen User-Agent: Thunderbird 1.5.0.12 (X11/20070530) MIME-Version: 1.0 To: Ralf Gross CC: xfs@oss.sgi.com Subject: Re: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c References: <06CCEA2EB1B80A4A937ED59005FA855101AED1BE@svits26.main.ad.rit.edu> <472A87FA.7000804@sandeen.net> <9489F071-7966-4230-9DAC-D783B6B9600A@rit.edu> <472A8BB9.7040100@sandeen.net> <20071102090740.GB23263@p15145560.pureserver.info> In-Reply-To: <20071102090740.GB23263@p15145560.pureserver.info> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4661/Fri Nov 2 06:48:31 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13537 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: sandeen@sandeen.net Precedence: bulk X-list: xfs Ralf Gross wrote: > Eric Sandeen schrieb: >> ... >> What controller are you using? If you say "areca" I might be on to >> something with some other bugs I've seen... > > I use areca controllers with xfs, but had no problems yet. Can you > explain what bugs might hit me? maybe none, it was just a wild guess. :) I've seen a bug on ext3, volumes > 2T corrupted, on an areca controller. Due to the 2T threshold, it seems more like a lower layer IO issue (2^32 x 512) than a filesystem issue... googling a bit I found others with problems on areca, but then that's what I googled for, so I might have self-selected. So, maybe nothing, I was just looking for a 3rd data point. -Eric From owner-xfs@oss.sgi.com Fri Nov 2 14:26:58 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 02 Nov 2007 14:27:03 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from py-out-1112.google.com (py-out-1112.google.com [64.233.166.181]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA2LQukO027923 for ; Fri, 2 Nov 2007 14:26:58 -0700 Received: by py-out-1112.google.com with SMTP id u77so1825595pyb for ; Fri, 02 Nov 2007 14:27:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=DTKb1zBYQ3TjarPcNAj9UXy0ml4Bz5QWewqBR0273K8=; b=hXG0H2lmrzz43qNHUTZ7z543vFzvhnPxfGZMhYAocUOyvva91yQrW/JeqFT1phsyEIUCDYSk2u2TnFHb3fvGJ+ZslMQ8RcAh94BtlUmuWk1Ol94JekxF+PJM7n+TRzAmOEwJBmble8Yd+xib4bEs/GFaJwisxyPjfXB5hYYreLI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=Ed+heS0BoheRcPzOeLRxaBnYfh0mZ/GfjIwVlbZ66GTVnoHuA1ZK+TSms058aPJ5tlE/Emy7ZPlU5KhlR66KQTrkUYdfD4SV7N6jDlAVdYOFfO7OkanYx9rBBgmpe9PmYChwrlV2i7R4dMd7SyyOU5azLuk6MG92kMna+KKTH1g= Received: by 10.64.27.13 with SMTP id a13mr7234527qba.1194037343896; Fri, 02 Nov 2007 14:02:23 -0700 (PDT) Received: by 10.65.112.13 with HTTP; Fri, 2 Nov 2007 14:02:23 -0700 (PDT) Message-ID: <64bb37e0711021402g4961e474u75e48fa5a893ab7a@mail.gmail.com> Date: Fri, 2 Nov 2007 22:02:23 +0100 From: "Torsten Kaiser" To: "David Chinner" Subject: Re: writeout stalls in current -git Cc: "Peter Zijlstra" , "Fengguang Wu" , "Maxim Levitsky" , linux-kernel@vger.kernel.org, "Andrew Morton" , linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com In-Reply-To: <20071102204258.GR995458@sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <200710221505.35397.maximlevitsky@gmail.com> <393060478.03650@ustc.edu.cn> <64bb37e0710310822r5ca6b793p8fd97db2f72a8655@mail.gmail.com> <393903856.06449@ustc.edu.cn> <64bb37e0711011120i63cdfe3ci18995d57b6649a8@mail.gmail.com> <64bb37e0711011200n228e708eg255640388f83da22@mail.gmail.com> <1193998532.27652.343.camel@twins> <64bb37e0711021222q7d12c825mc62d433c4fe19e8@mail.gmail.com> <20071102204258.GR995458@sgi.com> X-Virus-Scanned: ClamAV 0.91.2/4662/Fri Nov 2 10:28:34 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13539 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: just.for.lkml@googlemail.com Precedence: bulk X-list: xfs On 11/2/07, David Chinner wrote: > On Fri, Nov 02, 2007 at 08:22:10PM +0100, Torsten Kaiser wrote: > > [ 630.000000] SysRq : Emergency Sync > > [ 630.120000] Emergency Sync complete > > [ 632.850000] SysRq : Show Blocked State > > [ 632.850000] task PC stack pid father > > [ 632.850000] pdflush D ffff81000f091788 0 285 2 > > [ 632.850000] ffff810005d4da80 0000000000000046 0000000000000800 > > 0000007000000001 > > [ 632.850000] ffff81000fd52400 ffffffff8022d61c ffffffff80819b00 > > ffffffff80819b00 > > [ 632.850000] ffffffff80815f40 ffffffff80819b00 ffff810100316f98 > > 0000000000000000 > > [ 632.850000] Call Trace: > > [ 632.850000] [] task_rq_lock+0x4c/0x90 > > [ 632.850000] [] __wake_up_common+0x5a/0x90 > > [ 632.850000] [] __down+0xa7/0x11e > > [ 632.850000] [] default_wake_function+0x0/0x10 > > [ 632.850000] [] __down_failed+0x35/0x3a > > [ 632.850000] [] xfs_buf_lock+0x3e/0x40 > > [ 632.850000] [] _xfs_buf_find+0x13e/0x240 > > [ 632.850000] [] xfs_buf_get_flags+0x6f/0x190 > > [ 632.850000] [] xfs_buf_read_flags+0x12/0xa0 > > [ 632.850000] [] xfs_trans_read_buf+0x64/0x340 > > [ 632.850000] [] xfs_itobp+0x81/0x1e0 > > [ 632.850000] [] write_cache_pages+0x123/0x330 > > [ 632.850000] [] xfs_iflush+0xfe/0x520 > > That's stalled waiting on the inode cluster buffer lock. That implies > that the inode lcuser is already being written out and the inode has > been redirtied during writeout. > > Does the kernel you are testing have the "flush inodes in ascending > inode number order" patches applied? If so, can you remove that > patch and see if the problem goes away? It's 2.6.23-mm1 with only some small fixes. In it's broken-out directory I see: git-xfs.patch and writeback-fix-periodic-superblock-dirty-inode-flushing.patch writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-2.patch writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-3.patch writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-4.patch writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-5.patch writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-6.patch writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-7.patch writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists.patch writeback-fix-time-ordering-of-the-per-superblock-inode-lists-8.patch writeback-introduce-writeback_controlmore_io-to-indicate-more-io.patch I don't know if the patch you mentioned is part of that version of the mm-patchset. Torsten From owner-xfs@oss.sgi.com Sun Nov 4 02:18:28 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 04 Nov 2007 02:18:31 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA4AIP0a001629 for ; Sun, 4 Nov 2007 02:18:27 -0800 Received: from hch by pentafluge.infradead.org with local (Exim 4.63 #1 (Red Hat Linux)) id 1Ioc6b-0003m6-Pn; Sun, 04 Nov 2007 09:49:21 +0000 Date: Sun, 4 Nov 2007 09:49:21 +0000 From: Christoph Hellwig To: David Chinner Cc: Christoph Hellwig , xfs@oss.sgi.com, xfs-dev@sgi.com Subject: Re: [PATCH] show all mount args in /proc/mounts Message-ID: <20071104094921.GA14493@infradead.org> References: <20071029233543.GQ995458@sgi.com> <20071030100617.GB23489@infradead.org> <20071102014855.GH995458@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071102014855.GH995458@sgi.com> User-Agent: Mutt/1.4.2.3i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-Virus-Scanned: ClamAV 0.91.2/4671/Sat Nov 3 18:21:59 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13540 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs On Fri, Nov 02, 2007 at 12:48:55PM +1100, David Chinner wrote: > On Tue, Oct 30, 2007 at 10:06:17AM +0000, Christoph Hellwig wrote: > > On Tue, Oct 30, 2007 at 10:35:43AM +1100, David Chinner wrote: > > > There are several mount options that don't show up in /proc/mounts. > > > Add them in and clean up the showargs code at the same time. > > > > Looks good. Care to submit a patch ontop of this to move all the mount > > option handling to xfs_super.c as it's entirely linux-specific in this > > form? > > Sure. This what you mean? Yes, exactly. Looks good to me. From owner-xfs@oss.sgi.com Sun Nov 4 03:19:19 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 04 Nov 2007 03:19:29 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_50 autolearn=ham version=3.3.0-r574664 Received: from py-out-1112.google.com (py-out-1112.google.com [64.233.166.178]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA4BJF5q008590 for ; Sun, 4 Nov 2007 03:19:19 -0800 Received: by py-out-1112.google.com with SMTP id u77so2534901pyb for ; Sun, 04 Nov 2007 03:19:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; bh=NZykAKLjtcuHzPXcDO65poQqxlxE3dMCg+zJZrvejS0=; b=rHYLZ8SaYwJqQoncetNIx4YSijxxofphfL00YRNTU6i8LdS+9uyBgii42Z3bKKVnU+HjrWVmlJNlh/y9QlQaiGMWg7DPQKBZUoPDvGN1i4z9m3uZnMkl9VyJ+4xg0Lyw6Ytv6geFqh101j+WUNj4by2zyop9UDyBYJIfeVP/SGM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; b=BpWxF4kZt/GKotSn5NDhf1si6zuKIbTtlVRxenmCytnfcbkc6qTnpQsbJ441z90lqX/O5kgv671cKUG5JKECJOPEA0BEmGz/eOPnJKGtGkSvb4CGwjrkNlQ0fPSFeDPRJ+I72+0JaTCS1PtW6lvl3GjsJFBpF2P56ExPJLD6m00= Received: by 10.65.211.16 with SMTP id n16mr10422027qbq.1194175159261; Sun, 04 Nov 2007 03:19:19 -0800 (PST) Received: by 10.65.112.13 with HTTP; Sun, 4 Nov 2007 03:19:19 -0800 (PST) Message-ID: <64bb37e0711040319l5de285c3xea64474540a51b6e@mail.gmail.com> Date: Sun, 4 Nov 2007 12:19:19 +0100 From: "Torsten Kaiser" To: "David Chinner" Subject: Re: writeout stalls in current -git Cc: "Peter Zijlstra" , "Fengguang Wu" , "Maxim Levitsky" , linux-kernel@vger.kernel.org, "Andrew Morton" , linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com In-Reply-To: <20071102204258.GR995458@sgi.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_16409_14543774.1194175159246" References: <200710221505.35397.maximlevitsky@gmail.com> <393060478.03650@ustc.edu.cn> <64bb37e0710310822r5ca6b793p8fd97db2f72a8655@mail.gmail.com> <393903856.06449@ustc.edu.cn> <64bb37e0711011120i63cdfe3ci18995d57b6649a8@mail.gmail.com> <64bb37e0711011200n228e708eg255640388f83da22@mail.gmail.com> <1193998532.27652.343.camel@twins> <64bb37e0711021222q7d12c825mc62d433c4fe19e8@mail.gmail.com> <20071102204258.GR995458@sgi.com> X-Virus-Scanned: ClamAV 0.91.2/4671/Sat Nov 3 18:21:59 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13541 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: just.for.lkml@googlemail.com Precedence: bulk X-list: xfs ------=_Part_16409_14543774.1194175159246 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline On 11/2/07, David Chinner wrote: > That's stalled waiting on the inode cluster buffer lock. That implies > that the inode lcuser is already being written out and the inode has > been redirtied during writeout. > > Does the kernel you are testing have the "flush inodes in ascending > inode number order" patches applied? If so, can you remove that > patch and see if the problem goes away? I can now confirm, that I see this also with the current mainline-git-version I used 2.6.24-rc1-git-b4f555081fdd27d13e6ff39d455d5aefae9d2c0c plus the fix for the sg changes in ieee1394. Bisecting would be troublesome, as the sg changes prevent mainline to boot with my normal config / kill my network. treogen ~ # vmstat 10 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa -> starting emerge 1 0 0 3627072 332 157724 0 0 97 13 41 189 2 2 94 2 0 0 0 3607240 332 163736 0 0 599 10 332 951 2 1 93 4 0 0 0 3601920 332 167592 0 0 380 2 218 870 1 1 98 0 0 0 0 3596356 332 171648 0 0 404 21 182 818 0 0 99 0 0 0 0 3579328 332 180436 0 0 878 12 147 912 1 1 97 2 0 0 0 3575376 332 182776 0 0 236 4 244 953 1 1 95 3 2 1 0 3571792 332 185084 0 0 232 7 256 1003 2 1 95 2 0 0 0 3564844 332 187364 0 0 228 605 246 1167 2 1 93 4 0 0 0 3562128 332 189784 0 0 230 4 527 1238 2 1 93 4 0 1 0 3558764 332 191964 0 0 216 24 438 1059 1 1 93 6 0 0 0 3555120 332 193868 0 0 199 36 406 959 0 0 92 8 0 0 0 3552008 332 195928 0 0 197 11 458 1023 1 1 90 8 0 0 0 3548728 332 197660 0 0 183 7 496 1086 1 1 90 8 0 0 0 3545560 332 199372 0 0 170 8 483 1017 1 1 90 9 0 1 0 3542124 332 201256 0 0 190 1 544 1137 1 1 88 10 1 0 0 3536924 332 203296 0 0 195 7 637 1209 2 1 89 8 1 1 0 3485096 332 249184 0 0 101 16 10372 4537 13 3 76 8 2 0 0 3442004 332 279728 0 0 1086 40 219 1349 7 3 87 4 -> emerge is done reading its package database 1 0 0 3254796 332 448636 0 0 0 27 128 8360 24 6 70 0 2 0 0 3143304 332 554016 0 0 47 33 213 4480 16 11 72 1 -> kernel unpacked 1 0 0 3125700 332 560416 0 0 1 20 122 1675 24 1 75 0 1 0 0 3117356 332 567968 0 0 0 674 157 2975 24 2 73 1 2 0 0 3111636 332 573736 0 0 0 1143 151 1924 23 1 75 1 2 0 0 3102836 332 581332 0 0 0 890 153 1330 24 1 75 0 1 0 0 3097236 332 587360 0 0 0 656 194 1593 24 1 74 0 1 0 0 3086824 332 595480 0 0 0 812 235 2657 25 1 74 0 -> tar.bz2 created, installing starts now 0 0 0 3091612 332 601024 0 0 82 708 499 2397 17 4 78 1 0 0 0 3086088 332 602180 0 0 69 2459 769 2237 3 4 88 6 0 0 0 3085916 332 602236 0 0 2 1752 693 949 1 2 96 1 0 0 0 3084544 332 603564 0 0 66 4057 1176 2850 3 6 91 0 0 0 0 3078780 332 605572 0 0 98 3194 1169 3288 5 6 89 0 0 0 0 3077940 332 605924 0 0 17 1139 823 1547 1 2 97 0 0 0 0 3078268 332 605924 0 0 0 888 807 1329 0 1 99 0 -> first short stall procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 3077040 332 605924 0 0 0 1950 785 1495 0 2 89 8 0 0 0 3076588 332 605896 0 0 2 3807 925 2046 1 4 95 0 0 0 0 3076900 332 606052 0 0 11 2564 768 1471 1 3 95 1 0 0 0 3071584 332 607928 0 0 87 2499 1108 3433 4 6 90 0 -> second longer stall (emerge was not able to complete a single filemove until the 'resume' line) 0 0 0 3071592 332 607928 0 0 0 693 692 1289 0 0 99 0 0 0 0 3072584 332 607928 0 0 0 792 731 1507 0 1 99 0 0 0 0 3072840 332 607928 0 0 0 806 707 1521 0 1 99 0 0 0 0 3072724 332 607928 0 0 0 782 695 1372 0 0 99 0 0 0 0 3072972 332 607928 0 0 0 677 612 1301 0 0 99 0 0 0 0 3072772 332 607928 0 0 0 738 681 1352 1 1 99 0 0 0 0 3073020 332 607928 0 0 0 785 708 1328 0 1 99 0 0 0 0 3072896 332 607928 0 0 0 833 722 1383 0 0 99 0 -> emerge resumed 0 0 0 3069476 332 607972 0 0 2 4885 812 2062 1 4 90 5 1 0 0 3069648 332 608068 0 0 4 4658 833 2158 1 4 93 2 0 0 0 3064972 332 610364 0 0 106 2494 1095 3620 5 7 88 0 0 0 0 3057536 332 612444 0 0 86 2023 1012 3440 4 6 90 0 1 0 0 3054572 332 612368 0 0 102 1526 1024 2277 6 5 87 2 -> emerge finished, but still >100Mb of dirty data accoring to /proc/meminfo 0 0 0 3048548 332 615764 0 0 337 659 796 1000 3 1 96 0 0 0 0 3092100 332 615860 0 0 15 616 606 1040 1 0 99 0 0 0 0 3092148 332 615860 0 0 0 641 622 1085 0 0 99 0 0 0 0 3092528 332 615860 0 0 0 766 654 1055 1 1 99 0 -> slow writeout until here, might be fixed with Peters patch to scale the background threshold 2 0 0 3090828 332 615860 0 0 0 1804 707 1215 0 2 98 0 0 0 0 3091056 332 615864 0 0 0 3877 831 2047 1 4 94 1 3 0 0 3090780 332 615864 0 0 0 2048 784 1154 1 2 97 1 0 0 0 3091096 332 615864 0 0 0 2690 751 1538 0 3 96 1 0 1 0 3091056 332 615864 0 0 0 2018 748 866 0 2 95 2 2 0 0 3092960 332 615864 0 0 0 2076 719 1118 0 2 97 0 -> writeout "done", /proc/meminfo showed 0kb of dirty data remaining 0 0 0 3093072 332 615864 0 0 0 645 646 1104 0 0 99 0 0 0 0 3093532 332 615864 0 0 0 726 658 1223 0 1 99 0 0 0 0 3093540 332 615864 0 0 0 801 699 1314 0 1 99 0 0 0 0 3093580 332 615864 0 0 0 783 738 1350 0 1 99 0 0 0 0 3093284 332 615920 0 0 6 746 655 1381 1 1 98 0 0 0 0 3092872 332 615920 0 0 0 862 703 1391 1 1 98 0 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 3093224 332 615920 0 0 0 799 676 1394 0 0 99 0 0 0 0 3093304 332 615920 0 0 0 835 672 1514 1 1 98 0 0 0 0 3093476 332 615920 0 0 0 784 641 1404 1 1 98 0 0 0 0 3093264 332 615920 0 0 0 722 626 1483 1 1 99 0 0 0 0 3093476 332 615920 0 0 0 7 328 350 0 0 99 0 0 0 0 3093628 332 615920 0 0 0 11 332 407 0 0 99 0 -> disks finally go idle Torsten .config for 2.6.24-rc1+git attached ------=_Part_16409_14543774.1194175159246 Content-Type: application/x-gzip; name=config.gz Content-Transfer-Encoding: base64 X-Attachment-Id: f_f8lgztwk Content-Disposition: attachment; filename=config.gz H4sIABOYK0cCA4w8XXMjp7Lv+RWqza26SdU5WUu2FTtVfmAYRiKaGVhg9LEv U45XSXxjW3tkOSf597dhPgQMjLMPuzvdDTRNf9GAvv3m2wl6Ox2e70+PD/dP T39Pftu/7I/3p/2XyfP9H/vJw+Hl18fffpp8Obz872my//J4ghb548vbX5M/ 9seX/dPkz/3x9fHw8tNk9sP8h9nVv48PUyDJjo+T8vDnZDKbzGY/XV7/NL2e zC4ufvzm228wKzO6qLc383p+dfd39z2/Sqg6fwL6/LEgJREU14oWJAyt13In McrzIRoXTNYVT5Gy2uKc4ZVklcCk3iCFlylbBJpqKrImpZKjyDoRDKUYSWsC n1lJ6rRAl7MzTLdICa9lxTkTFrFUCK+UQMDNEEcKxJdMAConhBNh8VIU1XDE M0RsoHHPsOS01BwMp7LcELpYqsAcUU4TAZKrU5KjnbM8IFi+xcuA2KhELh89 gsFUzmAk8LIu0K5eojWpOa6zFHtYzniVw/iyLllKaqd5WlCLukqpMm2GwyaV ZvLbid3xEsma5mwxq6vL2eTxdfJyOE1e96c42fzKJut4IFm3thRW/8PHp8df Pj4fvrw97V8//k9VIlBNQXKCJPn4Q2NMH74BI/h2sjCG9qQ7e/t6NguyhRUG hS4Vyl3FqVdElMQC0hJmTMo1cKqHL8B8LmdWC5SvQVkoK+8+fDhPy0aA1BQL zEpubDmDZa0pxwOA/hcriyHOJN3WxaeKVJapJTKtuWCYSFkjjFUcU68v7XVS SK7AMJQMLk8liajLMM4oQ2BedNX8x5LhquMd+LDlDdxVWS2XNFN305tz13gh WMXDw2aIitrga4mXJA0SgdAyCXrDBcGg2GmATeHaWpKvgH5t1luk7voLVEBv jSNzlhnjmnHwjPQzqTMmagn/Ca10t4R9Q1IkJE2DfFU0nc4dNYC29cDxruBL 7go5hNQNXT9WDydbmEnNkZSBYZdM8byyFo0LWirLjSU2kuRZjcFbWmgwvjqr bBazSpGt5TxKsAWqPYzFNOHMbiLpokR5Zsnf+H4bIJcFKc6f66IhAXaqUjl+ W+ZVAmvqOSaZoyS0RkDskrEQmVB1oadl5nA2IlruGsYCTYxkZKFX5aIPKSyt clsODaCuIHagdAAG3cIWsh8X0K2TCdtK21wK3PuiPA+wuAJCO04yDkEAL2lp L7Ab1Xp7YbUJqUNMIhdea1ijgiPtm4xzpmwiH37fazd+fD07Z8qMXUMsYpZ/ 7KBI3j37sJSgNNfMDjA4+2TLCwIJqnKIYGFxdeiuv3AoMiSRjjXPI61atu4+ PPz6nw9nQyOk4LqtohltNNjIhx8PD/vX18Nxcvr7635y//Jl8uv+/vR23FvS UhTkCjmJBAM+S6tk9fLz+XMJiQc4PGmyOPFemlXRPHWTEBOX+slqyFoWoYkW qxtHP7nEYcXUziOcEfQs8SowgB47n0LKBHKskx3kLHfzqyCyiSpzG0e1gzAp Try9ktid+4Ix0DpOHREUFEMog54iLBZSeGkcB7/ugsBy2m5t9nXi4MELJaze QPCOnzJ6LotQJG5wBbadeqdrxrT7Xjr4muXgRZHYBZempQqM1LVPVpYzLyuT np4D0Y0BBZqboBBasfkV5CycDrrS3xBEQxpiMsqUSvhUdAEuGbI3lOTEy3kd itY+PRLJkYDUPtCDhGQTMilAMbHTXpY4ETeDZFo3K1BZoTzsauzhh3TdOD0H DYnFQYexxE30YsM0KAef78VZk/y3/MJiSbB0R4v6gXQuSPEoLzqPIPYe8Yxa w1+wfRgKzKdwjKlhq81AgkPnkIRz1cQQXsm73mQLuoCdk06/z/sx0mRq0t/t dls3vT6Lu2kfmiBxsOPXmkKQVwz8oC8/oxUEiRxkmJWaprSNuhQec3y5k9TY cw55zd3FX7A5hz/nmbdTjvi6ZjgjSU9GFl+cqMarezAwDr2lg4TFMv8FEjoI NRvacxaLckDsGkRwo0KZyhPXURWY+DEBQMbF5jGnCHhUOAnMCjKpcITAAkmw 0soNMz0asnfwlErLJuSOOsGDNsMufyh4SbDORcIj46ZSAJsVBVbORChb/lxP Ly7smQBkdn0R7BBQlxdRFPRzERzhDjDfWK4zCZQUkEip+BQI6ADVu63E0uu2 YKCDy5CekzKl5UK369OPw3/3x8nz/cv9b/vn/ctpcvh6ejy8WLkHL5wYUtQ5 WSAciR1Fk4qH9KuSengd2fTuVtLGdZwXq8GHhEQTCOjG/qOtLZqQjWEjjfOm VkcbvU3NIpveDt/sOuI0CMdxCVKQiOwi7IDjUcp2aAaYoXLAZwr+MNZJu+Vn wutHLYko7EgSjK/NFEDw8UkoBvE6QdFJ5GBCulxT78Bj3l34rEd0wSAJ9tjj bEP8icCOWNnhz+Sphe8LDamOtAjyeTGYokycGRq91844O+7/87Z/efh78vpw //T48ttZ6QFdZ4J8ssqcLaRWyNO9HhObbk+gw26gRw2GxsB9LiMdNzuLBVuD DQvYKEKmgEnYsYUa6QoPRGdMxrizG7AyJTBG+i47gIO+12B569HOPdYDUjAU WgUkROHzJs/B9/OI4C22Q2ibWeihUwStB5Ov/T7sy/HxT2evapROc1ayTW32 PmHEwM9oPHg1koIS8xqDKxG0ZJHA2SgxcNOonWHu9ff74/6L5ZOdru0WjbuC wJwFtcLlJKdJZK1omntroyFGeDlKU9voHGRByqpjOnl77TiefMcxnexPDz98 bwUUbIUm+IA0WRC7mKlhReHXFw0lK8DEHSOBYBeK3UBNdPHfye80EDmOw+wA mmMCCfs5j7EGcI5uOO7SlqqNqhY1hsAdTgjajFALx3dM5K/9w9vp/pen/eTX R/3X4fh8f3qdfJyQ57eney84J7TMCqUrdVYVpoEV1N7kUnQ5a9NFJ482cMQq 2y919Rvn25Q+K71V02lu0WwtvCq2W/kp96f/Ho5/OH61JAF0IOeAsEJsjTDf deEcWlQltQqP20wU7pfxFh5IVglYbE7xzrFTjWo2GSEnBkxDDuu0aEEjjWhp T4DyZtPWHmz1/QAcpWvtEdNawCqQkC4DUUYTfYCy9Nrykkf4pZzys5NsIAtB AiDnxM3uvDAMRQ50eKwkX4IWsBUlMiKTGi3turvexEjuQUCXWOEDzSKpqvQO bvTOKARs1lSXgZRApXQPAn2K8Q4SAkv57CBzwTxIStHC5wtzD6wh8N9Fv+gB VELxueseiqswfEOk2jCWBlBL+F8ILCPwHWRzAfga0n0ZgOujCpMKDVE5D/ZT sgAYEsdlAEzzHEIl1QMP6q8JxaOVXSOrUQotmlECM+lRik7wI4VgoWf8tw/t pnj34bh/OXywZ16k15IuPDNcBy2Jr+e2aPS3PsxZx3ZmhqDxMLo0lEX6bE+6 JHiqOkWpq9Dz2l6qBqKN1wO11utzV1A+j/iEuWPcXndhaMi84yTvdDE0cBtr pNaeADbR89kRmaTKg/TjeRLoynZmCyHDnhtsyq0zAmzDxEpXMyDVWYVbZTRX 7s6nBw5TliaPPBz3OgZDknHaH+FTH6i/HU2C4cTrthfNGi1XTgBxUXVzWj1C kLOFhdaHi2Wpz5VWbQEsDWNdqPKOvp1+tIQiKP/ySdOXrv+w2o2MFkJixcOY KtX1ShKZD9K7EBRBZn6fPWZ5ObuMoKjAEQxIN6FM1qWMEHAeHVCiMjYFSWON VDYUyoh6bBt1D6PAFsQCYiXOkZQ029lGE6QDLnwzGG2QSszHbMYiNcozymWZ RXXcJmr0fJStkhlZ/kPOmsPX0VEhfhQyJuRC38vS6bnJ28dpyrI5MXuPqrkm 8x7ViEzPRL6VD6lSjPk4gXyHwAlUAfyS5Bw86BhJTsqFWo6TvCuUAuF38O8I rN2+jPZhAo0TEkMdrZTacTJK86liCo1SCILyYpTC86EBAoUUeZfCJCXvUAla LkZJ/oGVmJuRYwTV5Wxck2Cf1ulBcwWC//QPIq3j7Pn6KubMdZ2n32jbiSKQ Ge8U3kEClnLfEzfwhm3KIZFakDBSsUgrQbDjUmwcwWUY4aSR9igqDyPYprRt 0+krTYWrxg22zYg8aOssBflZV5zCSCdLcTCVn8CUSPnf5oTW3tg5PRRIwhoJ lJIoZ201LIwGddM1kDBSooIM2JFlwfXtJIoHqHDuozF+gNcwPwHRMBUiDOQ+ ZjQeoB0kPIZlyodrXS7ymMwCytliAhrYtRmqWp+JVBI0Z8iDQBsfhETEpAAR VkBAnCfTeoc/51H/MPnOvsX6fdxdeLvB+bg3mMfcwdyybBVBwFYo1iYTaBFB LfNYd75tz22ftJ4vCUqjBK4nsRBFDEEqOr8a4IaLNY97hPmoJ5mH9dVuFJBF o1y2O9cpT9iVtwHVuh2kgDxZ1IVchKtyHQFLfsaR80xDs4TsxFwbCt4z6gjk Ek3d20kdpkivg51DNhI5KVVFEJ4Imi7Ctcd1jsr65mI2/RSpz2BwkkFUnuNZ pDiyjXCH8vBOYzsLTzRHPAkiCPwbYWsD8xnWfp1S8+AOtIXDuWya25eETBPQ 6ox55fbJaf96akryzgiQDC5IGT6tQAUELMrCohMpCq9g5NgcVH0rXG0wvGwe j/un/at1CICzhV7mqRUf8gbQT3RD9YsAaW49W1fpEPYadhBI4VPn9KhHmCJJ 5DqASxSySUJIN+JZMh3w/WY1FjuuS4p2zPGQGBcjWLWiQaxkmWp2HAG+WmSU P5GtaG65qua7piWv1ADaStZSqVvvAOKW11nq7Ys1MFyfSvd/Pj7sJ2l/GHt+ 6vH40IInzD84qppr4s1mroYceXn34ePrL48vH38/nL4+vf32wXonBFlKzuzb z1w0zTMqig3SRyPuLdlsU+vL2W6hzbBfp4KuIwbcEpC1IDL2jqFeQgor1lQG LyLpSG8uKdmsdrdTXH1W4XMZDnkH7AdDB6Wlc00PPiMLwo+H0+Hh8GQf05W8 PfJ2b4UPr4m7C6+hmH+qY66jRWMKlj1CoztOEb6dX4ySVN6N/QEBZhtTGwne HeqIcn1R/dmHGutjYVyZDCatwXJ7M85uMooWKPT8AKeCFdqJ43RtVVEdsH6G k+mL4fYbHIdgY25Mx0JhzUDDa6KWoSvhVGo/sqbYfTNBk6JGMix+vkSlYqHZ EMI1W7eXeOvktXJBa8rwVSTHyArzXCVsY6zc6eisWPjgRy1pueIoNZcqwpHO vQdubEJiSSdfGk/1+vb16+F4OluHQBS6U8p5rYLtOwb6y33eZyBqodz0DmBt fTWSusn2ulomwyy2vDVPDL778vj6x78mp/uv+39NcPpvELW1sejV1L37sxQN NBzVOzSTUo3YkBRDK5FCnwymtm/rB3OqqD0UL4eTPDzv7cV4nXy3/+G3H2B2 k/97+2P/y+Gv73sZPL89nR6/Pu0neVU6V2qMGE3prAZUMAOW5pqZDh2u2zUY 2CUsaLmINZQY8jwkdyX2lnuDqDLYbkdoJqKO9y+vmuEhj1Jf29N6FdeGDA8p bDw1fzfK6auaRPK97jVJThOJxmhgpzvKQ842OUQw57Cr4St2OtvpzOWm3sIf Y2FxBoDqFqjiBCh6QadBIzw+AKL4x/EBGoKa5e/0cvtOL7dXowT6MgEkESMU AhcynJ0UZIGMpyrJJrYD6GlGrv72NKNKseSKRjywwSeVBDOKnN637nJ7Ob2d jgiUQKwasYxKVZDZNTfJ4mSL1I1zLpbykTnqx6SR3VKHR9PIVfHGchQZWW25 K64v8Q2o1WyMQRFHfjIirqezm4sxIgReZBxvrDVOkvOxDlJ8eXv91zj+Qo3g /Sx14H5CN4+VewMaspqyceip5wssCjfHlQa0xM7VQAOU6000d+KURtZUY4t1 yEka7tZOdp6CX08LFO/pk/TeMrhonVdRSeIEcnsVY0XS3JMCQGZXQzFQGe+/ ymkcuaZoDKlgpz/COtgVns5n28gGqF3IEazLmoMqnFS+gYlYLtmgFaz4CB4X 6fzqwr8CGqAZmQ+W19ezi3H85Rh+p1+JyTgByZCIqnT7qiTeGpz95Xw+jv9x O47fzsp3CC7jeKpuZtPtO/iR9j+bF6AjDChBs5yMjFAgARlOPkIQdQwGDZnD OAHsC/DoAKW8+fFqevEewfUIDxCzx7Exj2QItM+KJUCGQBcXY9lLQ/DZfz7l 4iWezkbxBBZB6EtUcpTN2cVsTB3B281vxoahI93HXJtBbmiZsODbpyId1lLc x3UFZGW0JBE7BaxOyi6CXRvU1OvMwEbor66t361oYdf6OeYSceL11dxDQJFM CgjMjcjdyI4xLdy7/tFaJaBMLcYjlyXi+tF8uDCnL30LEYmXgP1MBIu2HJ2b 4TRHuzCnTZXSeUxcSe/pWnM2SAiZTC9vrybfZY/Hva6Rf28VGs7NqSC6Fh4p MuseunMA+fbL69+vp/2zVTx1asOaGLbkImGSxK4OvpyOh6f2Bx2GjSGm5bty O8JMzSB/srkKFBfPvNCUMEP8PMDJhM/84naPMC9E6zT4GL0nJGpp/tUFJ/1A x/q1GJ8mgEnXEcYE2lD7ivjMugkMH3WO29+C2NWsrPFSX72QLgnZKkj2EFf2 SzkPoWdn//SXRrcPZ2yp9FDtiXUJTYWe72oy/V7m2QGYXwRxQKaI5UDMT/48 OycmtNRvB9u6eOSua6BgeIY2FjY4wz6bWFUU4a2o9qVeIcbmTB8ghs/oPlWQ A36OlPFVsCa0JuZGmHtal0TjFeVRFOzRYyeE5pWEftPpEfTFV+qctGjHqYdx H0KSMnYpPp+F64okyqvBhJcF0onLm0hSukTmB2eCuB1kMWyTRfbk4mY6vw2n /lRObyP72FXkCFmudpG98+r2Jo+wABEcVhusgQYjlqILVjo/upWU2/AosHW+ jNWNkMqnMR3o1jU8WyWiZ6oT0PizY93oA3RI/L2zVH17zjrF2HTH7FP7KtNm pl/EP3uAumAlbY6nLE1vUNHDT8DP/L5mTl8ewhxCK/sFWIdpdrW4YvZjvw75 iUmfr9kIX7l+Sa7sFy8dpK5kYpt6D48+c6b2KyH9JNu+bMPz7QCmICfyYSUT iuTD5pgOYPoHSXKXxAcJKotrK2DBnOrPKeTP1mm5UPnN9OZHq9+0uHHO0/n1 VXti5sIaCVkA/dbzvMKbfEMzp27SguJa0uIlKSVVdA22Z61wh+QEK1FBTkaQ rPSvltiPEzuqRg/OwKvb+bXDG4RtC7BkUtk37Zrv/mjY+iEOF1GXa4GKARpW e9CXI5wEF1fgFQaA/rfNfLB9bNQBOWVByhqVqUaaBy2Oj7JoCu+XjgZEXQeh fP3KusYHH43uVZACmB/RsZhqcPqcLop3rxJoSHsTwWYcoDGlMThHPHpQRzQN RVwsLT4ukqbHEXE0pXJHKA1oTDRniqGA7LFbuhEBdBSeGLoBPGGcqcdEYlGN Ceb/Gbu650ZxLf+vpOblvuytMWAw3q37IINsq42ARsJ2+oXKJp6Z1O1OupLu u9P//epIYCQh4Tz0ZHx+R0IIfRwdnY/xGb7ugQUnCBrD4m2g+a1ReHjW9wv1 G1ZhF81cijS6NoFH6YKHy8XCp4oTaDyHJoGFGeWsdqwiaxfpm+Y+aQneq+EW enr4/sOISwBrd4Z4Zq7mB3TCupcC0GpwZ9QleyDKZT5emETxD6RNQ6Qc6SZN Wp/lmZRMp0hO14m+qwz0XZGECzSli3/BIl1Mgbow+nAg04yt0sjB3wjpn8k1 1tlc8AFn9qsIub+jcaJHGJbkMlyFC5O2wcWBlBZfQ4XY2J5NKq5ZVYZpmprk QxYKidWkQdu+oLZpjevYk+egss1z92Dck7p2KrFrw2BU/FSnK9ukQMOvHoUa Td1ef9NJQOk4vzepOYboaNgkbljeO4bqzahrk62yfhahZexakNoZ2XOf2eew rTPCGeipGosVLvWrAvsEbwhf6j2YASiDzzTifzxnR8JyjxZ5bxnbKrXPy/ef P7x2Jcr+TheigNBttxA/rMDM7XkPLNBUMM/q7Qza98vbVwg4+fwiFpY/Hh71 kJOqBBUiNTYsukx6VzOkD3sLZVmDcdmd/xUswuU8z/2/Vklqv9On6l6weN8H H1XTrFL46FpLVadOzAqNkgd8v6mQHhl4oAhB+rCxopr1CGtLATk/75WnONxk OfObLCU+cc91xNgaXp3Qyal51DpeDz4rforPGDpIYlmsmYsOF8jib127QLEk oJqTzF2SbPGmqg4uTDoz1BUxVaMaXrXZ/kC4JwousDHcEE8wRsWA6rrAsqIZ piMTx3PkVthfBya4ox1mWKRzBZ9jgFao0e//WGYQFzWLM1YfmumwR7wQh9EN 94QJHphIJ05IlSewh80UznCJIcdqsF2/VZ0MkUTRLM89Rl77NsXRyj+Tab1/ eHv6v4e3yx35vbob7KWuAQAbXdCVPzuSLpahcWktyeK/toWsxZHxNMxWnqs0 xSIEQTFQXbuOhIX8qSaaVaxBbsOBHaLYbpTSuP/18PbwCN5Cvb3b+M5H7URx lP7SsLNp59HTlCb4RjLEDwL1qd1MVPhMJdS9wuXt+eHrNHhWXzQVsrD5JXri tDVbclbxN8Xvzgh+qhczokgZQFm7gbLpWtRwNgYF1dGmLWX2i4HFfnfJhM8c l1ZEd+Vd8fryT+AQFNkN0tJwsnP3VZkx1TWi1hf24z8xl6EshEBap13N7w0d 13AC455bVVJT0u3Fhyyc8YbEWGgEWGluvVeSiv2pi08jhKh7/5IaEecByQi3 3ETrxG3aC4u2OBG7zYjFae7oXlnqLF1Fyd/drvaYY4l55zDc1wKsGiEmIeCq aDLE+fKwWwGq4FIn22OImNDHKxyVtZn4V7tfSPSvjNTtjCxztAIZhcbdUNZJ IYCU28okZxBEVMpKShYKszv09c/Xt+cff33TpSDBiopdZUTUHYh1ttVvfQYi ujoyikqvqzEEgXNcVKpiJIgjt8HBFU+iedw0+9BRmq9i0xvySu3YMk1Df8U0 T4Mg8OJi55gDPTsDgGDDsfQ0uJRx/0O7xRUsCMxfo3LoEFvKbs/9XE3F0BF5 rkGAQ8FLP04o9wTvVTCL43U8hyfRYg5eJ2cvzFHFhDRP/eW5x1tCloZ8DaBB 8TH4bDJ6rPZc/ku4qvKqiib7AH1+f7x8FYeqy6uYADAjsr+ev7t2ArHQN6zL WRBFmqZdp6+WRvArFiSpW7RQRZTTxLQqMW3TeLV0AWgdR2tPiXUwBSg6J+kq No7jPcRZEcax/0srJyive8bIAovKDZZNy249Z28qCZSEUhPHhzAjdrKZqLWA UcSUVuRaI3147722e5uI6dkSCooVdYNkcIprycHmoa/CKoB4aHZ0LQ3/reOw AfMCJ+HZ/Q1O7pVLiuYyAmZx75QEqB4sUv4EewzXCEDiFdtd27Qe+djiiubZ 8tUySG+w0GARBh/giT/Ak3yAZ32bJ7rZnnW4XNzg4avlajEz0w8px1Z+iKGs ND2Yr56f6/lGwkpzo1+3qyBdxNubPGm43d1giqNVzOZ5wPu5ucVSV7dr8cin A8uuiIPU46ym8YSLGzywysEXmmUiPF3NfOOCJtF0AS7oauGkxk7qyklNXdTU WW/qbEPqfFrqfNraWe86dK0ggj6/Koj9J0iC9S2eJInn54A6GNzmWYarWZ45 6ec6mygOVtGNeijLlisafIhpHX6AbROt5x/J9zS70UtCBErSBM3zcLi1u8GS huayaDGc0miVBvl0nACw9gKhD4ic9GKVxnrcEg1KwtV+60OwhCZvtc9RPb/g SE0jo27ZWe2kbmFFu/5R2TQd7o4vz4/vd+z56/Pj68vd5uHx399B5tQkCLj0 G/MkjFTtNYElI3BP5mYdUGOjYRu/fASYPE1fL7Dd9ZpMgzBFwS/zj58vjzLu TX/5MaqTxvP+FpJIxOZRbKzh2+Xp+cGhFgMj0k5dfhik47II9LQwA20SBvsK hTYtQ7XyKVO2jnp/KQaUH729pjj6gAUwFJi89refwY+QibGyW89ztFrq8ciu 5HWqbwqKTFkdLfU7dEU+ibV7yswQWgXBeVL1sY6NW+O+D85hnC5d5GgpK7G6 5EiOxNsbG54u02kZ0aIkTlOPdZ58XO050o1wOIOLB8ThMkE3WdY+C1HxDaNF lMxVwKW/wwzHl6pB5XwTVkG0nOGg580Mmnti8Qwj8Uxa2lUNqcrbbDtMSUnm uvw8+8HIkR/nSqMt7rLMmapKzER5dT7OOqNofWwEOpkZPbkLwTXYU6YLlzOg zDo6neMDLOc5cV49K0ZMw/Q8mVai7JEwI7iwBoj/Me0VjiRD1CSRDZ3QDlUp GE+WhcjnlmQHwdtRcKgrd7b1BuZJIKZzYDelOopCsESZ7Kf1Okl13SAQq2Mc WMYfDC9texBWrjNxsLKI/JjYdiBfskVkF65Pxo1/T/IutLKaJkqWZ2M5klal vTm9x+U9P24mmgtzTWxL06RAI3dZy3hF3TljFQfl4UIfEYoq1vA01MXnnoxR vEoCN3llkxmhtR6HTDYM0vHucOmggmkX29nffcOzyXgljdQ9T6jT7eKqlxr7 08zNqgL7vD18/wuEmomGBu30PNe7GtS0ieF1CsRJSjID9blmAWZpAYc2NmaT ITx5vnWrKgFsgjB1VwMjDMt5bRSgO+StzNdcwHxKS1nOr+s97pAMszqz2FYt h3vW3pTFbXjkiU0js0Dt9rwrsnzIpzKN6PT8LqTUX1O7Fj1fYl2ge28N2evL ++tXIRUONakoUJMhI17WdZUmyHClCGGvwOqkKoqNOwRzv89a9pnS4cRxXSn6 q0v/lumce+n858uTJopXrUxIJDH09J+Hl8fL013x/PLzb8V6h94e/3r+cXmE pK9aOT2NkfhhJ70CUp1Rk7A/5bg2SQ06UZIT4xADedjw5xaXmU+xIjgoOYsV pPL4B/ePn8UbnslGexny+xJRkolHlVXjr6cfD9J5CTkN3ICr91nrE7sZpxYD hYzX/hY57TCHsGPjscIs5HVFkv3Ia3T0v5u6322DxKc+l3XU7dJx4gHDU0+b UB6k6dpbISqYL11hDy8XsziJl3Hgx/3e7yMszYOpn6lNfXdqAxzOw9EM/IVH kXPJBlQcQFZnewRJogyCNbmeNfgytAgWiR+mxGdLLGG2DNNgDk58sWAk7PX4 lwMVNQWa6bQdKedgsTbPFlfVL+erX96o3o/TqkQzq5Ufw9m+inZeGOxdPJdO I0xuMeSf/M+nbbA4BLdw/2fFJdwHLm7gMw9gwTpKZ+HED2+pdfGtbzg56k1C ftlkx14EZEioncmw8RwffDAqEQTd9aCM7MCcrPDhQkayJ69WNSf7nJKZ91F8 ELD1jIw0ZyZOcU6Qv4nRIl5O0T5oxhTob+Ig9+EMdk1ivbBfj/lXWwD9yyzJ cLAKwnl8ZlbKr5+eFzcZ/E04VM0uCGfaQBFmQiiN5mQV5DPeE3BJwziZEWPO e7+M0hCIcIH9OMVROIeuk3k09pdmcH4/kg32y0a8EY0r/a8u5kIazuwYPX5j Lz6ew9DfzHu6daUyhEvuUTq5OrR4BBbpR9HmM+us8kM5h/eeuav8YQj6rCm5 dfLVcl61TkVBeH1RJhoPP5+eX12HEvUdMocSPrtDeiFI1XjHfn6/vO2H162+ X156IV89zqgYDgYgj1LszHp8lXu1C3jz6h14vAkpSQ4xxX+N/a4bwFteRFa9 PUmz3gWTbY/Hidgfth51iwA1ozNog6DcbV4F59X4waHhl5orjy08YNJI2/PI SZcp9xyVlqPbZ7kHqfa6PakBYRvS95bWdSHiMAWCl5+kuVSFIdqBeV4Z6Sov kuddzdOT2fo+nbZda40bRjxBPKXajrud0Pav7z9mbVyk3nAPPsl2F0sqg2RZ HWGVA2uqinf7dtNx7kA5eIadZAB0+2UIq4MgOcMjPT1UjS0yXxPoGwhkIEQ3 BBdxLPP3icVMKanmnifelBe45zbfqPW0hxVpENjvYXA0KUqSeL2yma5fqHds yb4+vL9PrbDlgM4s7bA8CkvFpbJhrjj+7zulga0aUCRdXiAP7bsMSPpfd+Ap 9A8VZPX5/d/DqvePYa379vDr7uHr++vd/17uXi6Xp8vT/4gWXowK95ev3+Uy +Q2yzz2/qOS2Rh4cjX3SS4o8q+G98iCOtmhjTYoe3DYYZxX11U9YbsWTcj3A 0L3oiPh/xN0Qy/NmsfZjcezGPrVUhjFyo6gQ+yZyYwfUUAsqyKZlrcwOP+4N 3x7+hEy8T67dgeZZurAU8jIyGgjw3+y1uXQOT7lBPpnB3IHeuw2IpUUgP1zK PHV1AEoS+zpBKk4mArWGIdJkZopSE5R7rxtsDpFYW5xY7wDqgrJ9tAycyGlP ON5jxJ2oOD9CONoMF9hM26XXXYfBwl15H8evo6kTxrTGOyey5blYm+W1sjkT ergtQVnqmwmKSQatd9ZOal0U0wE3P853/tcfwI4T9zgQA93zVUh9ctIHB6Za ZitydUDPMd8Bh4K5m3SoNqSA7ChOlGa8a8ModIMVW4HnsQ8zrto1rERHity9 UBdhtIicUMVJksbuwbOvl2cnwNAWe4GuhmzxuXNWMoIbCI+lUnc4q/B84+x+ g5tPRq5ADT2L2V5R9wvWZt5WHaIlUZ7trgEwt9XgL0oWvK5lpsTn3IMxJYn1 xQUptBYalLe8tXodtUJ0O6Eit5z4SRXba3OBdxUHqd0iZ1ZZ0ZnNvXQN2WG7 Bot1WGGy+1WWRDYGAQBs2WIPCXy4k3ggbjoVrW766ycnAwRytC7CSa78Wa3P Jxc27Av0JfsTzLP6Cyff/CYMbt93VrCCKxlURNbWanUb5CbOhCC/aRC3F0lS nVAjPl9jt51jxt0iHiuE4PQkDq3jJnkFdw9Pf15+wPnTPsBCnTsE/ef2n6KZ MzrbyYrDAr/UcWQQGAXF3tKBxN+e//zTGPmyKG/IbmcYaujk/nbrmxMT+2bD N2rz1LxCt6QkG+SMm4lzlA1Ts7lAj4B08z7EPjSOm8DrnegSpHSo0CyEV7En 3LCESRquV/EcQ+y7YwE/M9/RL6doDJynblYp0s71Wg9B4ChSIe6OgXgkDZcR 71RW69Hn/WjcfYuf0+t1DVPBka8RBd6/PzxKb1xjDJpHJgltn9++SV+xfKoC wHnuyaVTFF2zcfsY5Fm+Qe4eI+R6Bbt9/nq5G4fB8MAzD7utthv1hO4MOQzG vhnIdcXIWYhsxRRiOGsbFahqfJ+hNmfwCoFG9sMj98Mj/8OjmYcvZRIeV998 MrPUiJ/euSCqpxuZIsYwoMFErMtb5qleAuObfXI3/5On6UD3B3qXpYbUsMzd YsfDzqpNxm+V6NYgae3RqHp9Yzo4ATizK+22zBxWPaGDG0RSQnQ3M76mjed6 cl241+6ruz5a0ZQlMUXsUFSewJdisTx3ngxnTUV9H5CoKKV6EAIVtlTsutrQ HPpvDHIIhGvmd6lo2KLM46gLioi+hNgUS1/wzs9byruj6ypKIaHVnoxrXz2/ vsho/tPyavLeFrr0wVuIFOFIuKMy6vyeH3O51ExWGnFcWifJwpwVVUGwdrr/ Ipj0EdrmW4MffpfFuP5X7Pct4r+X3P3ILaR01YpTJkoYlKPNAr/7uyZ5o1WD PmYZrVw4qUCGhJhKvz2/v6ZpvP5ncM2zVvJhBRj9wfnWHzRRgs1VcV2/X34+ vd794XqtPuWRlRTtYLrXSxq7Z8ZYGGw4r2EXa1MBKwnjVHep11shTxUbvZae 1NVKdTXmbhN/hj5waIbNVxtHXz4zMrd+bD8LQSwtH7zB/qIbP4R9K0fWIGp+ +ON5pnG1H/tcnpd+VHxMz97WWpuP+t2dxKpuKhfnxiPmEDke4mFh8SSOqZ7m cNhJxrEtKEeXb7oElmZRlepZD1EiqLmeD5nlojarfiB5RqVCzafkkE5JE7rF xpJbP40i+AxqFr3boKl9tM9xBrVlU2f2727H2IQG8UQi7QGCJrZWYO0OzSY2 QkCMEKsP1HPfSzeeAUf0VsMvKR1oLRppoUU8YXTo6pNM3m5BlhAmaXJpMBKk kWEMhZ6WdXomZ6u6ts6QGd+yzGrfeBfLMfKvDJ7JCMk3rwtQ/SDOQ9LNhv/6 rmt7a3HOguRFpcoplZl3LkisrOXI42xAxbY3OBAl4uh8g4ejhtzggcylTg5j k7tyaPEuxZQQB/9DgTZm8jElIrF2M1ctqwrRNjEq0sSofFxMRCUyEOz1Ge5M wTm98YKQ7XCeQ2y+jWjxjWraW99M3hHc4MHbW42BZThJZz+JElnGH1drlt8e v6fx4jcdGoSOTggdxsTQMZ9jpcm0it1tGVnSeOF9Ruqxz7CY4o8wfaC1qccv 1WIKPsL0kYZ7Aq1YTMuPMH2kCzz5gCym9W2mdfSBmtbx4iM1faCf1ssPtCld +ftJSPQgGXfprbEYhNK7zv0EAQaeChDLCDGn1/DMwK5vAMKbzY1uctx+5fgm R3KTY3WTY32TI7j9MsHS07tXhtjuy0NF0q7x1izh1lNry7farYs662t5Gfzp bkjJIOI5yBL2/iMOPFsI7Oo+Qh8E7DMno6g5WD5QUlw4XN5eLl/v/np4/LfK HH/NOCAO9BAOa1ugHZua3ihTe6nZ1cTLEm5k5TFfvF0tBGDETTGj56AtE0dP CJ/l2twRZDAUW07z2dx7IagopEjcVIVTEyj1JKbIvseQ0pt5n6XKHCD7RjFN /M327nhgOeZCtpfOJiBjt2a8DjBnmc1gKHAppzozTVCwBu0lWy0wbtubInWe HVqBDRddy/HZMwgGWF5DzGY0ZzUpvYbwvUJq5kGKAyrohOxbefLINdURD8ow txwFFUBnfKCloo8wrm+0BrRuCILt4GILfTDX+kO1+SS+snfQiH+934mmh5JI H6LNqs8T6qpvHGH+JzEjXuNWHL3B+16q3dzaKwg6Bra5YqXwde0GDLP6STzJ ZjUOmKztuJj34AFvX12Nj1OKovKTPO96BEgEvoquES+jQfYtccxAjrIDuGVs i+o0NwqAr2uZz0Wunx5V7ssfS8QZuPVY+7HL48+35x+/NCM/PTavMwmyQ989 0Aa1w0wxCDuANmKl58RMoXRlAJWFweWJV9Gzgx1eXTi3m3PVqERQmvwug4Fb sSwVjWKa1fc29Wyk+IZcbZX9G4J/IT3X6kDeFIeM1HvcTCGxeGtnfkX0s1vH e0WkqEQ7FzNF2YR4zjbZ5IFlWxQTIs2XDlo8qZHtUeAihnEyKS/IcRBOyKfa ReW7JlhPybttEKa0nTYYZ5sJDd7Wblvt6oKiOU1oZz3gylAh/MkndeZ69rOe tlUZ/aZfF/JEsf30fU/VHH2iw7JgONHrudSGLsdNDQllvk1GHnPRhmom740Y j53U6VfmGE2rbrJppYc9+uIY/6hsN2TaOobxlHcI1T+ZFCTbI1zA3/8fljh0 NjbCIgysdHx8PFEWU8F9hLXugrk3NMTD1c8Ziz6PcIyS1iczqSixqFKhKL+0 JDMPefTKyTMkyBV0FoQrstOcQQdIhKDEn7OhGaoKzxBgHYYiZIySb5zNUa6D Qvd/FOhiEWCNjhqQYFGM4AU2iEKCXUGLy7GJxXsjX5CAJO7ki1XYLdgX+UYp z2DkzQKOwaAqC/kAZYgQ2p4Cx2DoNTEAiI2JuMO8AAA= ------=_Part_16409_14543774.1194175159246-- From owner-xfs@oss.sgi.com Sun Nov 4 04:39:10 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 04 Nov 2007 04:39:15 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-0.7 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from lucidpixels.com (lucidpixels.com [75.144.35.66]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA4Cd9Vg021712 for ; Sun, 4 Nov 2007 04:39:10 -0800 Received: by lucidpixels.com (Postfix, from userid 1001) id EAF2B1C000262; Sun, 4 Nov 2007 07:39:13 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by lucidpixels.com (Postfix) with ESMTP id D363B4019581; Sun, 4 Nov 2007 07:39:13 -0500 (EST) Date: Sun, 4 Nov 2007 07:39:13 -0500 (EST) From: Justin Piszcz X-X-Sender: jpiszcz@p34.internal.lan To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org cc: xfs@oss.sgi.com Subject: Re: 2.6.23.1: mdadm/raid5 hung/d-state (md3_raid5 stuck in endless loop?) In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: ClamAV 0.91.2/4671/Sat Nov 3 18:21:59 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13542 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jpiszcz@lucidpixels.com Precedence: bulk X-list: xfs Time to reboot, before reboot: top - 07:30:23 up 13 days, 13:33, 10 users, load average: 16.00, 15.99, 14.96 Tasks: 221 total, 7 running, 209 sleeping, 0 stopped, 5 zombie Cpu(s): 0.0%us, 25.5%sy, 0.0%ni, 74.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8039432k total, 1744356k used, 6295076k free, 164k buffers Swap: 16787768k total, 160k used, 16787608k free, 616960k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 688 root 15 -5 0 0 0 R 100 0.0 121:21.43 md3_raid5 273 root 20 0 0 0 0 D 0 0.0 14:40.68 pdflush 274 root 20 0 0 0 0 D 0 0.0 13:00.93 pdflush # cat /proc/fs/xfs/stat extent_alloc 301974 256068291 310513 240764389 abt 1900173 15346352 738568 731314 blk_map 276979807 235589732 864002 211245834 591619 513439614 0 bmbt 50717 367726 14177 11846 dir 3818065 361561 359723 975628 trans 48452 2648064 570998 ig 6034530 2074424 43153 3960106 0 3869384 460831 log 282781 10454333 3028 399803 173488 push_ail 3267594 0 1620 2611 730365 0 4476 0 10269 0 xstrat 291940 0 rw 61423078 103732605 attr 0 0 0 0 icluster 312958 97323 419837 vnodes 90721 4019823 0 1926744 3929102 3929102 3929102 0 buf 14678900 11027087 3651843 25743 760449 0 0 15775888 280425 xpc 966925905920 1047628533165 1162276949815 debug 0 # cat meminfo MemTotal: 8039432 kB MemFree: 6287000 kB Buffers: 164 kB Cached: 617072 kB SwapCached: 0 kB Active: 178404 kB Inactive: 589880 kB SwapTotal: 16787768 kB SwapFree: 16787608 kB Dirty: 494280 kB Writeback: 86004 kB AnonPages: 151240 kB Mapped: 17092 kB Slab: 259696 kB SReclaimable: 170876 kB SUnreclaim: 88820 kB PageTables: 11448 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 20807484 kB Committed_AS: 353536 kB VmallocTotal: 34359738367 kB VmallocUsed: 15468 kB VmallocChunk: 34359722699 kB # echo 3 > /proc/sys/vm/drop_caches # cat /proc/meminfo MemTotal: 8039432 kB MemFree: 6418352 kB Buffers: 32 kB Cached: 597908 kB SwapCached: 0 kB Active: 172028 kB Inactive: 579808 kB SwapTotal: 16787768 kB SwapFree: 16787608 kB Dirty: 494312 kB Writeback: 86004 kB AnonPages: 154104 kB Mapped: 17416 kB Slab: 144072 kB SReclaimable: 53100 kB SUnreclaim: 90972 kB PageTables: 11832 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 20807484 kB Committed_AS: 360748 kB VmallocTotal: 34359738367 kB VmallocUsed: 15468 kB VmallocChunk: 34359722699 kB Nothing is actually happening on the device itself however. Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdj 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdk 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdl 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 # vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 6 0 160 6420244 32 600092 0 0 221 227 5 1 1 1 98 0 6 0 160 6420228 32 600120 0 0 0 0 1015 142 0 25 75 0 6 0 160 6420228 32 600120 0 0 0 0 1005 127 0 25 75 0 6 0 160 6420228 32 600120 0 0 0 41 1022 151 0 26 74 0 6 0 160 6420228 32 600120 0 0 0 0 1011 131 0 25 75 0 6 0 160 6420228 32 600120 0 0 0 0 1013 124 0 25 75 0 6 0 160 6420228 32 600120 0 0 0 0 1042 129 0 25 75 0 # uname -mr 2.6.23.1 x86_64 # cat /proc/vmstat nr_free_pages 1598911 nr_inactive 146381 nr_active 42724 nr_anon_pages 37181 nr_mapped 4097 nr_file_pages 151975 nr_dirty 123572 nr_writeback 21501 nr_slab_reclaimable 16152 nr_slab_unreclaimable 24284 nr_page_table_pages 2823 nr_unstable 0 nr_bounce 0 nr_vmscan_write 20712 pgpgin 1015377151 pgpgout 1043634578 pswpin 0 pswpout 40 pgalloc_dma 4 pgalloc_dma32 319052932 pgalloc_normal 621945603 pgalloc_movable 0 pgfree 942598566 pgactivate 31123819 pgdeactivate 18438560 pgfault 360236898 pgmajfault 16158 pgrefill_dma 0 pgrefill_dma32 11683348 pgrefill_normal 18799274 pgrefill_movable 0 pgsteal_dma 0 pgsteal_dma32 176658679 pgsteal_normal 233628315 pgsteal_movable 0 pgscan_kswapd_dma 0 pgscan_kswapd_dma32 164181746 pgscan_kswapd_normal 217338820 pgscan_kswapd_movable 0 pgscan_direct_dma 0 pgscan_direct_dma32 13074075 pgscan_direct_normal 17342937 pgscan_direct_movable 0 pginodesteal 332816 slabs_scanned 12368000 kswapd_steal 380216091 kswapd_inodesteal 9858653 pageoutrun 1167045 allocstall 68454 pgrotated 40 # cat /proc/zoneinfo Node 0, zone DMA pages free 2601 min 3 low 3 high 4 scanned 0 (a: 11 i: 12) spanned 4096 present 2486 nr_free_pages 2601 nr_inactive 0 nr_active 0 nr_anon_pages 0 nr_mapped 1 nr_file_pages 0 nr_dirty 0 nr_writeback 0 nr_slab_reclaimable 0 nr_slab_unreclaimable 4 nr_page_table_pages 0 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 protection: (0, 3246, 7917, 7917) pagesets cpu: 0 pcp: 0 count: 0 high: 0 batch: 1 cpu: 0 pcp: 1 count: 0 high: 0 batch: 1 vm stats threshold: 6 cpu: 1 pcp: 0 count: 0 high: 0 batch: 1 cpu: 1 pcp: 1 count: 0 high: 0 batch: 1 vm stats threshold: 6 cpu: 2 pcp: 0 count: 0 high: 0 batch: 1 cpu: 2 pcp: 1 count: 0 high: 0 batch: 1 vm stats threshold: 6 cpu: 3 pcp: 0 count: 0 high: 0 batch: 1 cpu: 3 pcp: 1 count: 0 high: 0 batch: 1 vm stats threshold: 6 all_unreclaimable: 1 prev_priority: 12 start_pfn: 0 Node 0, zone DMA32 pages free 699197 min 1166 low 1457 high 1749 scanned 0 (a: 14 i: 0) spanned 1044480 present 831104 nr_free_pages 699197 nr_inactive 38507 nr_active 11855 nr_anon_pages 11228 nr_mapped 612 nr_file_pages 39127 nr_dirty 38462 nr_writeback 34 nr_slab_reclaimable 8164 nr_slab_unreclaimable 4747 nr_page_table_pages 756 nr_unstable 0 nr_bounce 0 nr_vmscan_write 6132 protection: (0, 0, 4671, 4671) pagesets cpu: 0 pcp: 0 count: 183 high: 186 batch: 31 cpu: 0 pcp: 1 count: 52 high: 62 batch: 15 vm stats threshold: 36 cpu: 1 pcp: 0 count: 23 high: 186 batch: 31 cpu: 1 pcp: 1 count: 14 high: 62 batch: 15 vm stats threshold: 36 cpu: 2 pcp: 0 count: 173 high: 186 batch: 31 cpu: 2 pcp: 1 count: 61 high: 62 batch: 15 vm stats threshold: 36 cpu: 3 pcp: 0 count: 95 high: 186 batch: 31 cpu: 3 pcp: 1 count: 57 high: 62 batch: 15 vm stats threshold: 36 all_unreclaimable: 0 prev_priority: 12 start_pfn: 4096 Node 0, zone Normal pages free 897091 min 1678 low 2097 high 2517 scanned 0 (a: 29 i: 0) spanned 1212416 present 1195840 nr_free_pages 897091 nr_inactive 107874 nr_active 30878 nr_anon_pages 25956 nr_mapped 3484 nr_file_pages 112857 nr_dirty 85110 nr_writeback 21467 nr_slab_reclaimable 7988 nr_slab_unreclaimable 19546 nr_page_table_pages 2067 nr_unstable 0 nr_bounce 0 nr_vmscan_write 14580 protection: (0, 0, 0, 0) pagesets cpu: 0 pcp: 0 count: 124 high: 186 batch: 31 cpu: 0 pcp: 1 count: 1 high: 62 batch: 15 vm stats threshold: 42 cpu: 1 pcp: 0 count: 68 high: 186 batch: 31 cpu: 1 pcp: 1 count: 9 high: 62 batch: 15 vm stats threshold: 42 cpu: 2 pcp: 0 count: 79 high: 186 batch: 31 cpu: 2 pcp: 1 count: 10 high: 62 batch: 15 vm stats threshold: 42 cpu: 3 pcp: 0 count: 47 high: 186 batch: 31 cpu: 3 pcp: 1 count: 60 high: 62 batch: 15 vm stats threshold: 42 all_unreclaimable: 0 prev_priority: 12 start_pfn: 1048576 On Sun, 4 Nov 2007, Justin Piszcz wrote: > # ps auxww | grep D > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > root 273 0.0 0.0 0 0 ? D Oct21 14:40 [pdflush] > root 274 0.0 0.0 0 0 ? D Oct21 13:00 [pdflush] > > After several days/weeks, this is the second time this has happened, while > doing regular file I/O (decompressing a file), everything on the device went > into D-state. > > # mdadm -D /dev/md3 > /dev/md3: > Version : 00.90.03 > Creation Time : Wed Aug 22 10:38:53 2007 > Raid Level : raid5 > Array Size : 1318680576 (1257.59 GiB 1350.33 GB) > Used Dev Size : 146520064 (139.73 GiB 150.04 GB) > Raid Devices : 10 > Total Devices : 10 > Preferred Minor : 3 > Persistence : Superblock is persistent > > Update Time : Sun Nov 4 06:38:29 2007 > State : active > Active Devices : 10 > Working Devices : 10 > Failed Devices : 0 > Spare Devices : 0 > > Layout : left-symmetric > Chunk Size : 1024K > > UUID : e37a12d1:1b0b989a:083fb634:68e9eb49 > Events : 0.4309 > > Number Major Minor RaidDevice State > 0 8 33 0 active sync /dev/sdc1 > 1 8 49 1 active sync /dev/sdd1 > 2 8 65 2 active sync /dev/sde1 > 3 8 81 3 active sync /dev/sdf1 > 4 8 97 4 active sync /dev/sdg1 > 5 8 113 5 active sync /dev/sdh1 > 6 8 129 6 active sync /dev/sdi1 > 7 8 145 7 active sync /dev/sdj1 > 8 8 161 8 active sync /dev/sdk1 > 9 8 177 9 active sync /dev/sdl1 > > If I wanted to find out what is causing this, what type of debugging would I > have to enable to track it down? Any attempt to read/write files on the > devices fails (also going into d-state). Is there any useful information I > can get currently before rebooting the machine? > > # pwd > /sys/block/md3/md > # ls > array_state dev-sdj1/ rd2@ stripe_cache_active > bitmap_set_bits dev-sdk1/ rd3@ stripe_cache_size > chunk_size dev-sdl1/ rd4@ suspend_hi > component_size layout rd5@ suspend_lo > dev-sdc1/ level rd6@ sync_action > dev-sdd1/ metadata_version rd7@ sync_completed > dev-sde1/ mismatch_cnt rd8@ sync_speed > dev-sdf1/ new_dev rd9@ sync_speed_max > dev-sdg1/ raid_disks reshape_position sync_speed_min > dev-sdh1/ rd0@ resync_start > dev-sdi1/ rd1@ safe_mode_delay > # cat array_state > active-idle > # cat mismatch_cnt > 0 > # cat stripe_cache_active > 1 > # cat stripe_cache_size > 16384 > # cat sync_action > idle > # cat /proc/mdstat > Personalities : [raid1] [raid6] [raid5] [raid4] > md1 : active raid1 sdb2[1] sda2[0] > 136448 blocks [2/2] [UU] > > md2 : active raid1 sdb3[1] sda3[0] > 129596288 blocks [2/2] [UU] > > md3 : active raid5 sdl1[9] sdk1[8] sdj1[7] sdi1[6] sdh1[5] sdg1[4] sdf1[3] > sde1[2] sdd1[1] sdc1[0] > 1318680576 blocks level 5, 1024k chunk, algorithm 2 [10/10] > [UUUUUUUUUU] > > md0 : active raid1 sdb1[1] sda1[0] > 16787776 blocks [2/2] [UU] > > unused devices: > # > > Justin. > From owner-xfs@oss.sgi.com Sun Nov 4 04:51:56 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 04 Nov 2007 04:52:05 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-0.7 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from lucidpixels.com (lucidpixels.com [75.144.35.66]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA4CptTc023254 for ; Sun, 4 Nov 2007 04:51:56 -0800 Received: by lucidpixels.com (Postfix, from userid 1001) id 123691C000263; Sun, 4 Nov 2007 07:52:00 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by lucidpixels.com (Postfix) with ESMTP id 0C91E4019581; Sun, 4 Nov 2007 07:52:00 -0500 (EST) Date: Sun, 4 Nov 2007 07:52:00 -0500 (EST) From: Justin Piszcz X-X-Sender: jpiszcz@p34.internal.lan To: Michael Tokarev cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, xfs@oss.sgi.com Subject: Re: 2.6.23.1: mdadm/raid5 hung/d-state In-Reply-To: <472DBF8C.2060508@msgid.tls.msk.ru> Message-ID: References: <472DBF8C.2060508@msgid.tls.msk.ru> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: ClamAV 0.91.2/4672/Sun Nov 4 03:38:42 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13543 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jpiszcz@lucidpixels.com Precedence: bulk X-list: xfs On Sun, 4 Nov 2007, Michael Tokarev wrote: > Justin Piszcz wrote: >> # ps auxww | grep D >> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND >> root 273 0.0 0.0 0 0 ? D Oct21 14:40 [pdflush] >> root 274 0.0 0.0 0 0 ? D Oct21 13:00 [pdflush] >> >> After several days/weeks, this is the second time this has happened, >> while doing regular file I/O (decompressing a file), everything on the >> device went into D-state. > > The next time you come across something like that, do a SysRq-T dump and > post that. It shows a stack trace of all processes - and in particular, > where exactly each task is stuck. > > /mjt > Yes I got it before I rebooted, ran that and then dmesg > file. Here it is: [1172609.665902] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172609.668768] ffffffff80747dc0 ffff81015c3aa918 ffff810091c899b4 ffff810091c899a8 [1172609.668871] Call Trace: [1172609.674472] [] schedule_timeout+0x5f/0xd0 [1172609.677362] [] process_timeout+0x0/0x10 [1172609.680243] [] do_select+0x468/0x560 [1172609.683105] [] __pollwait+0x0/0x130 [1172609.685969] [] default_wake_function+0x0/0x10 [1172609.688851] [] default_wake_function+0x0/0x10 [1172609.691712] [] default_wake_function+0x0/0x10 [1172609.694534] [] default_wake_function+0x0/0x10 [1172609.697324] [] skb_copy_datagram_iovec+0x1a1/0x260 [1172609.700103] [] _spin_lock_bh+0x9/0x20 [1172609.702856] [] release_sock+0x13/0xb0 [1172609.705598] [] tcp_recvmsg+0x370/0x940 [1172609.708303] [] sock_common_recvmsg+0x30/0x50 [1172609.710999] [] sock_aio_read+0x11b/0x130 [1172609.713694] [] core_sys_select+0x209/0x300 [1172609.716397] [] autoremove_wake_function+0x0/0x30 [1172609.719112] [] default_wake_function+0x0/0x10 [1172609.721824] [] current_fs_time+0x1e/0x30 [1172609.724525] [] tty_ldisc_deref+0x52/0x80 [1172609.727215] [] sys_select+0xd1/0x1c0 [1172609.729880] [] system_call+0x7e/0x83 [1172609.732517] [1172609.735115] bash S 0000000000000000 0 30959 30958 [1172609.737742] ffff810091c8be88 0000000000000086 0000000000000000 ffff8101ea172e20 [1172609.740404] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172609.743087] ffffffff80747dc0 ffff81015c3ab028 ffff810091c8be54 ffff810091c8be48 [1172609.743190] Call Trace: [1172609.748404] [] do_wait+0x599/0xc90 [1172609.751071] [] __wake_up+0x43/0x70 [1172609.753714] [] default_wake_function+0x0/0x10 [1172609.756345] [] system_call+0x7e/0x83 [1172609.758967] [1172609.761522] sr S 0000000000000000 0 30966 30959 [1172609.764123] ffff810122d7de88 0000000000000082 0000000000000000 ffff8101eab3ee20 [1172609.766769] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172609.769442] ffffffff80747dc0 ffff8101ea173028 ffff810122d7de54 ffff810122d7de48 [1172609.769545] Call Trace: [1172609.774734] [] do_wait+0x599/0xc90 [1172609.777369] [] default_wake_function+0x0/0x10 [1172609.779999] [] system_call+0x7e/0x83 [1172609.782616] [1172609.785168] screen S 0000000000000000 0 30972 30966 [1172609.787768] ffff810144597f68 0000000000000086 ffff810144597f30 00000000ffffffff [1172609.790416] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172609.793085] ffffffff80747dc0 ffff8101eab3f028 ffff810144597f34 ffff810144597f28 [1172609.793188] Call Trace: [1172609.798381] [] alarm_setitimer+0x35/0x70 [1172609.801049] [] sys_pause+0x19/0x30 [1172609.803705] [] system_call+0x7e/0x83 [1172609.806361] [1172609.808980] sshd S 0000000000000000 0 30973 7582 [1172609.811659] ffff810084003bf8 0000000000000082 0000000000000000 ffffffff80508e74 [1172609.814376] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172609.817104] ffffffff80747dc0 ffff8101ea172208 ffff810084003bc4 ffff810084003bb8 [1172609.817207] Call Trace: [1172609.822530] [] skb_queue_tail+0x24/0x60 [1172609.825292] [] schedule_timeout+0x95/0xd0 [1172609.828060] [] prepare_to_wait+0x23/0x80 [1172609.830820] [] unix_stream_recvmsg+0x386/0x550 [1172609.833587] [] autoremove_wake_function+0x0/0x30 [1172609.836344] [] link_path_walk+0x80/0xf0 [1172609.839074] [] sock_aio_read+0x11b/0x130 [1172609.841794] [] get_unused_fd_flags+0x79/0x120 [1172609.844488] [] do_sync_read+0xd9/0x120 [1172609.847161] [] autoremove_wake_function+0x0/0x30 [1172609.849848] [] __dentry_open+0x11f/0x1b0 [1172609.852541] [] do_filp_open+0x3a/0x50 [1172609.855235] [] vfs_read+0x157/0x160 [1172609.857922] [] sys_read+0x53/0x90 [1172609.860620] [] system_call+0x7e/0x83 [1172609.863343] [1172609.866063] sshd S 0000000000000000 0 30975 30973 [1172609.868838] ffff810175c219e8 0000000000000086 ffff810175c219b0 0000000000000002 [1172609.871649] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172609.874490] ffffffff80747dc0 ffff81021b27d738 ffff810175c219b4 ffff810175c219a8 [1172609.874594] Call Trace: [1172609.880153] [] schedule_timeout+0x5f/0xd0 [1172609.883020] [] process_timeout+0x0/0x10 [1172609.885890] [] do_select+0x468/0x560 [1172609.888742] [] __pollwait+0x0/0x130 [1172609.891581] [] default_wake_function+0x0/0x10 [1172609.894430] [] default_wake_function+0x0/0x10 [1172609.897258] [] default_wake_function+0x0/0x10 [1172609.900060] [] default_wake_function+0x0/0x10 [1172609.902841] [] add_partial+0x19/0x60 [1172609.905606] [] __slab_free+0x15d/0x310 [1172609.908363] [] _spin_lock_bh+0x9/0x20 [1172609.911093] [] release_sock+0x13/0xb0 [1172609.913795] [] tcp_recvmsg+0x370/0x940 [1172609.916486] [] sock_common_recvmsg+0x30/0x50 [1172609.919151] [] sock_aio_read+0x11b/0x130 [1172609.921799] [] core_sys_select+0x209/0x300 [1172609.924455] [] autoremove_wake_function+0x0/0x30 [1172609.927122] [] default_wake_function+0x0/0x10 [1172609.929786] [] current_fs_time+0x1e/0x30 [1172609.932438] [] tty_ldisc_deref+0x52/0x80 [1172609.935083] [] sys_select+0xd1/0x1c0 [1172609.937702] [] system_call+0x7e/0x83 [1172609.940292] [1172609.942843] bash S 0000000000000000 0 30976 30975 [1172609.945423] ffff8101bf371e88 0000000000000082 0000000000000000 ffff81021e322710 [1172609.948037] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172609.950671] ffffffff80747dc0 ffff8101882bf738 ffff8101bf371e54 ffff8101bf371e48 [1172609.950774] Call Trace: [1172609.955888] [] do_wait+0x599/0xc90 [1172609.958505] [] __wake_up+0x43/0x70 [1172609.961098] [] vfs_ioctl+0x220/0x2c0 [1172609.963662] [] default_wake_function+0x0/0x10 [1172609.966234] [] sys_ioctl+0x49/0x80 [1172609.968766] [] system_call+0x7e/0x83 [1172609.971279] [1172609.973759] screen S 0000000000000000 0 30991 30976 [1172609.976308] ffff8101a8329f68 0000000000000086 0000000000000000 00000000ffffffff [1172609.978892] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172609.981501] ffffffff80747dc0 ffff81021e322918 ffff8101a8329f34 ffff8101a8329f28 [1172609.981605] Call Trace: [1172609.986634] [] alarm_setitimer+0x35/0x70 [1172609.989220] [] sys_pause+0x19/0x30 [1172609.991766] [] system_call+0x7e/0x83 [1172609.994292] [1172609.996787] screen D ffff8100a18ff800 0 30992 30991 [1172609.999344] ffff8101a854dd28 0000000000000086 ffff81022854ddb7 ffff8101a854dcd8 [1172610.001953] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172610.004574] ffffffff80747dc0 ffff810170233028 ffffffff80656bcb ffffffff8021f8bc [1172610.004677] Call Trace: [1172610.009752] [] task_rq_lock+0x4c/0x90 [1172610.012366] [] try_to_wake_up+0x68/0x3b0 [1172610.014981] [] wait_for_completion+0x7d/0xc0 [1172610.017594] [] default_wake_function+0x0/0x10 [1172610.020208] [] flush_cpu_workqueue+0x6a/0x90 [1172610.022828] [] wq_barrier_func+0x0/0x10 [1172610.025447] [] flush_workqueue+0x33/0x50 [1172610.028076] [] release_dev+0x44f/0x750 [1172610.030710] [] mntput_no_expire+0x27/0xb0 [1172610.033339] [] tty_release+0x11/0x20 [1172610.035958] [] __fput+0xb1/0x1a0 [1172610.038547] [] filp_close+0x54/0x90 [1172610.041106] [] sys_close+0x96/0x100 [1172610.043652] [] system_call+0x7e/0x83 [1172610.046160] [1172610.048618] bash ? 0000000000000000 0 30993 30992 [1172610.051135] ffff8101aa2a3ee8 0000000000000046 ffff8101aa2a3eb0 0000000000000011 [1172610.053708] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172610.056312] ffffffff80747dc0 ffff810170233738 ffff8101aa2a3eb4 ffff8101aa2a3ea8 [1172610.056415] Call Trace: [1172610.061510] [] do_exit+0x5be/0x8a0 [1172610.064172] [] do_group_exit+0x2c/0x80 [1172610.066859] [] system_call+0x7e/0x83 [1172610.069537] [1172610.072190] sshd S 0000000000000000 0 7001 7582 [1172610.074908] ffff8100792b1bf8 0000000000000082 0000000000000000 ffff8101e9c51b80 [1172610.077679] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172610.080477] ffffffff80747dc0 ffff8102234ff738 ffff8100792b1bc4 ffff8100792b1bb8 [1172610.080580] Call Trace: [1172610.086042] [] schedule_timeout+0x95/0xd0 [1172610.088861] [] prepare_to_wait+0x23/0x80 [1172610.091673] [] unix_stream_recvmsg+0x386/0x550 [1172610.094492] [] autoremove_wake_function+0x0/0x30 [1172610.097318] [] link_path_walk+0x80/0xf0 [1172610.100148] [] sock_aio_read+0x11b/0x130 [1172610.102976] [] get_unused_fd_flags+0x79/0x120 [1172610.105822] [] do_sync_read+0xd9/0x120 [1172610.108651] [] autoremove_wake_function+0x0/0x30 [1172610.111495] [] __dentry_open+0x11f/0x1b0 [1172610.114319] [] do_filp_open+0x3a/0x50 [1172610.117118] [] vfs_read+0x157/0x160 [1172610.119902] [] sys_read+0x53/0x90 [1172610.122638] [] system_call+0x7e/0x83 [1172610.125360] [1172610.128056] sshd S 0000000000000000 0 7003 7001 [1172610.130818] ffff8100675a39e8 0000000000000082 ffff8100675a39b0 0000000000000002 [1172610.133623] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172610.136446] ffffffff80747dc0 ffff810225459028 ffff8100675a39b4 ffff8100675a39a8 [1172610.136549] Call Trace: [1172610.142064] [] schedule_timeout+0x5f/0xd0 [1172610.144899] [] process_timeout+0x0/0x10 [1172610.147716] [] do_select+0x468/0x560 [1172610.150495] [] __pollwait+0x0/0x130 [1172610.153260] [] default_wake_function+0x0/0x10 [1172610.156005] [] default_wake_function+0x0/0x10 [1172610.158707] [] default_wake_function+0x0/0x10 [1172610.161378] [] default_wake_function+0x0/0x10 [1172610.164026] [] skb_copy_datagram_iovec+0x1a1/0x260 [1172610.166675] [] _spin_lock_bh+0x9/0x20 [1172610.169315] [] release_sock+0x13/0xb0 [1172610.171917] [] tcp_recvmsg+0x370/0x940 [1172610.174494] [] sock_common_recvmsg+0x30/0x50 [1172610.177085] [] sock_aio_read+0x11b/0x130 [1172610.179638] [] core_sys_select+0x209/0x300 [1172610.182178] [] autoremove_wake_function+0x0/0x30 [1172610.184734] [] default_wake_function+0x0/0x10 [1172610.187290] [] current_fs_time+0x1e/0x30 [1172610.189837] [] tty_ldisc_deref+0x52/0x80 [1172610.192370] [] sys_select+0xd1/0x1c0 [1172610.194900] [] system_call+0x7e/0x83 [1172610.197426] [1172610.199919] bash S 000000000000000e 0 7004 7003 [1172610.202470] ffff8100cc263e88 0000000000000082 80000000804ca065 ffff81022367f530 [1172610.205071] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172610.207699] ffffffff80747dc0 ffff8102234fe918 ffff8100cc263e38 ffff810035a16348 [1172610.207802] Call Trace: [1172610.212949] [] do_page_fault+0x202/0x890 [1172610.215618] [] do_wait+0x599/0xc90 [1172610.218263] [] __wake_up+0x43/0x70 [1172610.220900] [] vfs_ioctl+0x220/0x2c0 [1172610.223509] [] default_wake_function+0x0/0x10 [1172610.226109] [] sys_ioctl+0x49/0x80 [1172610.228693] [] system_call+0x7e/0x83 [1172610.231240] [1172610.233746] aur S 0000000000000000 0 7014 7004 [1172610.236319] ffff810098071e88 0000000000000086 ffff810098071e50 ffffffff80232c93 [1172610.238941] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172610.241566] ffffffff80747dc0 ffff81022367f738 ffff810098071e54 ffff810098071e48 [1172610.241669] Call Trace: [1172610.246766] [] get_signal_to_deliver+0x73/0x470 [1172610.249380] [] do_wait+0x599/0xc90 [1172610.251983] [] default_wake_function+0x0/0x10 [1172610.254563] [] system_call+0x7e/0x83 [1172610.257122] [1172610.259648] aur S 0000000000000004 0 7066 7014 [1172610.262226] ffff810085231e88 0000000000000086 ffff8101ea314ce8 ffffffff80232c93 [1172610.264844] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172610.267471] ffffffff80747dc0 ffff8101ea314918 ffffffff802302ce ffffffff8020b3d6 [1172610.267574] Call Trace: [1172610.272674] [] get_signal_to_deliver+0x73/0x470 [1172610.275315] [] recalc_sigpending+0xe/0x30 [1172610.277948] [] do_notify_resume+0x536/0x7a0 [1172610.280577] [] do_wait+0x599/0xc90 [1172610.283199] [] default_wake_function+0x0/0x10 [1172610.285840] [] system_call+0x7e/0x83 [1172610.288491] [1172610.291116] unrar D ffff8100aa785c80 0 7135 7066 [1172610.293792] ffff8101ecf4ddb8 0000000000000086 ffff8101ecf4dd80 0000000000000000 [1172610.296525] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172610.299256] ffffffff80747dc0 ffff81021e53f028 ffff8101ecf4dd84 ffff8101ecf4dd78 [1172610.299359] Call Trace: [1172610.304629] [] vn_iowait+0x75/0xa0 [1172610.307301] [] autoremove_wake_function+0x0/0x30 [1172610.309979] [] xfs_trans_alloc+0x9c/0xb0 [1172610.312653] [] xfs_itruncate_start+0x35/0xe0 [1172610.315340] [] xfs_free_eofblocks+0x17a/0x280 [1172610.318032] [] xfs_release+0x134/0x1e0 [1172610.320711] [] xfs_file_release+0x1a/0x30 [1172610.323417] [] __fput+0xb1/0x1a0 [1172610.326144] [] filp_close+0x54/0x90 [1172610.328895] [] sys_close+0x96/0x100 [1172610.331631] [] system_call+0x7e/0x83 [1172610.334353] [1172610.337050] sshd D 0000000000000000 0 7187 7582 [1172610.339811] ffff81002b62fd28 0000000000000086 ffff81002b62fcf0 ffff81002b62fcd8 [1172610.342618] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172610.345448] ffffffff80747dc0 ffff8101ccda3028 ffff81002b62fcf4 ffff81002b62fce8 [1172610.345551] Call Trace: [1172610.351072] [] wait_for_completion+0x7d/0xc0 [1172610.353915] [] default_wake_function+0x0/0x10 [1172610.356765] [] flush_cpu_workqueue+0x6a/0x90 [1172610.359622] [] wq_barrier_func+0x0/0x10 [1172610.362477] [] flush_workqueue+0x33/0x50 [1172610.365337] [] release_dev+0x44f/0x750 [1172610.368184] [] sys_fchmodat+0x6a/0x120 [1172610.371026] [] tty_release+0x11/0x20 [1172610.373843] [] __fput+0xb1/0x1a0 [1172610.376628] [] filp_close+0x54/0x90 [1172610.379402] [] sys_close+0x96/0x100 [1172610.382135] [] system_call+0x7e/0x83 [1172610.384846] [1172610.387529] sshd ? 0000000000000000 0 7218 7187 [1172610.390280] ffff81013bd7bee8 0000000000000046 ffff81013bd7beb0 0000000000000011 [1172610.393084] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172610.395907] ffffffff80747dc0 ffff8101bf6ae918 ffff81013bd7beb4 ffff81013bd7bea8 [1172610.396010] Call Trace: [1172610.401528] [] __cond_resched+0x1c/0x50 [1172610.404362] [] do_exit+0x5be/0x8a0 [1172610.407192] [] do_group_exit+0x2c/0x80 [1172610.409993] [] system_call+0x7e/0x83 [1172610.412776] [1172610.415520] sshd S 0000000000000000 0 7236 7582 [1172610.418293] ffff8101e4a89bf8 0000000000000082 ffff8101e4a89bc0 ffff81013bf542c0 [1172610.421090] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172610.423898] ffffffff80747dc0 ffff810168684208 ffff8101e4a89bc4 ffff8101e4a89bb8 [1172610.424001] Call Trace: [1172610.429474] [] schedule_timeout+0x95/0xd0 [1172610.432284] [] prepare_to_wait+0x23/0x80 [1172610.435096] [] unix_stream_recvmsg+0x386/0x550 [1172610.437896] [] autoremove_wake_function+0x0/0x30 [1172610.440690] [] link_path_walk+0x80/0xf0 [1172610.443487] [] sock_aio_read+0x11b/0x130 [1172610.446249] [] get_unused_fd_flags+0x79/0x120 [1172610.448997] [] do_sync_read+0xd9/0x120 [1172610.451737] [] autoremove_wake_function+0x0/0x30 [1172610.454491] [] __dentry_open+0x11f/0x1b0 [1172610.457244] [] do_filp_open+0x3a/0x50 [1172610.459989] [] vfs_read+0x157/0x160 [1172610.462724] [] sys_read+0x53/0x90 [1172610.465430] [] system_call+0x7e/0x83 [1172610.468131] [1172610.470765] sshd S 0000000000000000 0 7238 7236 [1172610.473440] ffff810046e1f9e8 0000000000000082 ffff810046e1f9b0 0000000000000002 [1172610.476161] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172610.478873] ffffffff80747dc0 ffff810168685028 ffff810046e1f9b4 ffff810046e1f9a8 [1172610.478975] Call Trace: [1172610.484236] [] schedule_timeout+0x5f/0xd0 [1172610.486940] [] process_timeout+0x0/0x10 [1172610.489645] [] do_select+0x468/0x560 [1172610.492340] [] __pollwait+0x0/0x130 [1172610.495030] [] default_wake_function+0x0/0x10 [1172610.497738] [] default_wake_function+0x0/0x10 [1172610.500417] [] default_wake_function+0x0/0x10 [1172610.503076] [] default_wake_function+0x0/0x10 [1172610.505711] [] skb_copy_datagram_iovec+0x1a1/0x260 [1172610.508366] [] _spin_lock_bh+0x9/0x20 [1172610.511004] [] release_sock+0x13/0xb0 [1172610.513638] [] tcp_recvmsg+0x370/0x940 [1172610.516245] [] sock_common_recvmsg+0x30/0x50 [1172610.518841] [] sock_aio_read+0x11b/0x130 [1172610.521423] [] core_sys_select+0x209/0x300 [1172610.523974] [] autoremove_wake_function+0x0/0x30 [1172610.526518] [] default_wake_function+0x0/0x10 [1172610.529058] [] current_fs_time+0x1e/0x30 [1172610.531592] [] tty_ldisc_deref+0x52/0x80 [1172610.534118] [] sys_select+0xd1/0x1c0 [1172610.536645] [] system_call+0x7e/0x83 [1172610.539162] [1172610.541651] bash S 000000000000000e 0 7239 7238 [1172610.544203] ffff8100aae5fe88 0000000000000082 80000001bab2c065 ffff810145b6ae20 [1172610.546809] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172610.549446] ffffffff80747dc0 ffff810168685738 ffff8100aae5fe38 ffff810065785f18 [1172610.549550] Call Trace: [1172610.554709] [] do_page_fault+0x202/0x890 [1172610.557368] [] update_curr+0x109/0x120 [1172610.560022] [] do_wait+0x599/0xc90 [1172610.562647] [] __sched_text_start+0x166/0x23d [1172610.565267] [] __wake_up+0x43/0x70 [1172610.567871] [] vfs_ioctl+0x220/0x2c0 [1172610.570435] [] default_wake_function+0x0/0x10 [1172610.572998] [] sys_ioctl+0x49/0x80 [1172610.575555] [] system_call+0x7e/0x83 [1172610.578118] [1172610.580652] sshd S 0000000000000000 0 7248 7582 [1172610.583235] ffff8101120d5bf8 0000000000000082 ffff8101120d5bc0 ffff81001e998dc0 [1172610.585865] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172610.588489] ffffffff80747dc0 ffff810130906208 ffff8101120d5bc4 ffff8101120d5bb8 [1172610.588592] Call Trace: [1172610.593666] [] schedule_timeout+0x95/0xd0 [1172610.596253] [] prepare_to_wait+0x23/0x80 [1172610.598824] [] unix_stream_recvmsg+0x386/0x550 [1172610.601405] [] autoremove_wake_function+0x0/0x30 [1172610.603992] [] link_path_walk+0x80/0xf0 [1172610.606571] [] sock_aio_read+0x11b/0x130 [1172610.609138] [] get_unused_fd_flags+0x79/0x120 [1172610.611720] [] do_sync_read+0xd9/0x120 [1172610.614293] [] autoremove_wake_function+0x0/0x30 [1172610.616883] [] __dentry_open+0x11f/0x1b0 [1172610.619463] [] do_filp_open+0x3a/0x50 [1172610.622029] [] vfs_read+0x157/0x160 [1172610.624594] [] sys_read+0x53/0x90 [1172610.627144] [] system_call+0x7e/0x83 [1172610.629703] [1172610.632237] sshd S 0000000000000000 0 7250 7248 [1172610.634822] ffff810126f3d9e8 0000000000000086 ffff810126f3d9b0 0000000000000002 [1172610.637453] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172610.640086] ffffffff80747dc0 ffff810130907028 ffff810126f3d9b4 ffff810126f3d9a8 [1172610.640190] Call Trace: [1172610.645268] [] schedule_timeout+0x5f/0xd0 [1172610.647857] [] process_timeout+0x0/0x10 [1172610.650429] [] do_select+0x468/0x560 [1172610.652990] [] __pollwait+0x0/0x130 [1172610.655552] [] default_wake_function+0x0/0x10 [1172610.658131] [] default_wake_function+0x0/0x10 [1172610.660680] [] default_wake_function+0x0/0x10 [1172610.663230] [] default_wake_function+0x0/0x10 [1172610.665779] [] skb_copy_datagram_iovec+0x1a1/0x260 [1172610.668368] [] _spin_lock_bh+0x9/0x20 [1172610.670946] [] release_sock+0x13/0xb0 [1172610.673510] [] tcp_recvmsg+0x370/0x940 [1172610.676075] [] sock_common_recvmsg+0x30/0x50 [1172610.678653] [] sock_aio_read+0x11b/0x130 [1172610.681222] [] core_sys_select+0x209/0x300 [1172610.683798] [] autoremove_wake_function+0x0/0x30 [1172610.686386] [] default_wake_function+0x0/0x10 [1172610.688970] [] current_fs_time+0x1e/0x30 [1172610.691546] [] tty_ldisc_deref+0x52/0x80 [1172610.694114] [] sys_select+0xd1/0x1c0 [1172610.696683] [] system_call+0x7e/0x83 [1172610.699244] [1172610.701782] bash S 000000000000000e 0 7251 7250 [1172610.704370] ffff810121e8de88 0000000000000086 800000008e47c065 ffff8101afbec710 [1172610.707005] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172610.709641] ffffffff80747dc0 ffff810130907738 ffff810121e8de38 ffff8101e65ef9d8 [1172610.709744] Call Trace: [1172610.714827] [] do_page_fault+0x202/0x890 [1172610.717419] [] do_wait+0x599/0xc90 [1172610.719979] [] __wake_up+0x43/0x70 [1172610.722535] [] vfs_ioctl+0x220/0x2c0 [1172610.725088] [] default_wake_function+0x0/0x10 [1172610.727650] [] sys_ioctl+0x49/0x80 [1172610.730203] [] system_call+0x7e/0x83 [1172610.732759] [1172610.735278] su S 0000000000000000 0 7269 7251 [1172610.737850] ffff8101a5007e88 0000000000000086 ffff8101a5007e50 ffff8100219c0e20 [1172610.740475] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172610.743107] ffffffff80747dc0 ffff8101afbec918 ffff8101a5007e54 ffff8101a5007e48 [1172610.743210] Call Trace: [1172610.748316] [] do_wait+0x599/0xc90 [1172610.750913] [] default_wake_function+0x0/0x10 [1172610.753518] [] system_call+0x7e/0x83 [1172610.756084] [1172610.758600] bash S 0000000000000000 0 7270 7269 [1172610.761175] ffff81014bc9be88 0000000000000086 ffff81014bc9be50 ffff810139e7c000 [1172610.763792] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172610.766419] ffffffff80747dc0 ffff8100219c1028 ffff81014bc9be54 ffff81014bc9be48 [1172610.766521] Call Trace: [1172610.771636] [] do_wait+0x599/0xc90 [1172610.774264] [] __wake_up+0x43/0x70 [1172610.776885] [] vfs_ioctl+0x220/0x2c0 [1172610.779492] [] default_wake_function+0x0/0x10 [1172610.782107] [] sys_ioctl+0x49/0x80 [1172610.784719] [] system_call+0x7e/0x83 [1172610.787329] [1172610.789920] sshd S 0000000000000000 0 7278 7582 [1172610.792579] ffff810194cf5bf8 0000000000000086 ffff810194cf5bc0 ffff810010755600 [1172610.795276] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172610.797987] ffffffff80747dc0 ffff81002d667738 ffff810194cf5bc4 ffff810194cf5bb8 [1172610.798090] Call Trace: [1172610.803311] [] schedule_timeout+0x95/0xd0 [1172610.805992] [] prepare_to_wait+0x23/0x80 [1172610.808641] [] unix_stream_recvmsg+0x386/0x550 [1172610.811284] [] autoremove_wake_function+0x0/0x30 [1172610.813937] [] link_path_walk+0x80/0xf0 [1172610.816593] [] sock_aio_read+0x11b/0x130 [1172610.819250] [] get_unused_fd_flags+0x79/0x120 [1172610.821914] [] do_sync_read+0xd9/0x120 [1172610.824602] [] autoremove_wake_function+0x0/0x30 [1172610.827337] [] __dentry_open+0x11f/0x1b0 [1172610.830101] [] do_filp_open+0x3a/0x50 [1172610.832855] [] vfs_read+0x157/0x160 [1172610.835593] [] sys_read+0x53/0x90 [1172610.838321] [] system_call+0x7e/0x83 [1172610.841049] [1172610.843744] sshd S 0000000000000000 0 7280 7278 [1172610.846501] ffff81013acb39e8 0000000000000082 ffff81013acb39b0 0000000000000002 [1172610.849305] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172610.852125] ffffffff80747dc0 ffff81011060e918 ffff81013acb39b4 ffff81013acb39a8 [1172610.852228] Call Trace: [1172610.857719] [] schedule_timeout+0x5f/0xd0 [1172610.860553] [] process_timeout+0x0/0x10 [1172610.863393] [] do_select+0x468/0x560 [1172610.866228] [] __pollwait+0x0/0x130 [1172610.869046] [] default_wake_function+0x0/0x10 [1172610.871874] [] default_wake_function+0x0/0x10 [1172610.874652] [] default_wake_function+0x0/0x10 [1172610.877377] [] default_wake_function+0x0/0x10 [1172610.880068] [] add_partial+0x19/0x60 [1172610.882722] [] __slab_free+0x15d/0x310 [1172610.885354] [] _spin_lock_bh+0x9/0x20 [1172610.887984] [] release_sock+0x13/0xb0 [1172610.890616] [] tcp_recvmsg+0x370/0x940 [1172610.893251] [] sock_common_recvmsg+0x30/0x50 [1172610.895887] [] sock_aio_read+0x11b/0x130 [1172610.898512] [] core_sys_select+0x209/0x300 [1172610.901144] [] autoremove_wake_function+0x0/0x30 [1172610.903780] [] default_wake_function+0x0/0x10 [1172610.906421] [] current_fs_time+0x1e/0x30 [1172610.909040] [] tty_ldisc_deref+0x52/0x80 [1172610.911632] [] sys_select+0xd1/0x1c0 [1172610.914215] [] system_call+0x7e/0x83 [1172610.916754] [1172610.919253] bash S 000000000000000e 0 7281 7280 [1172610.921808] ffff8101919e3e88 0000000000000082 80000001542be065 ffff8100867c7530 [1172610.924409] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172610.927021] ffffffff80747dc0 ffff81011060f028 ffff8101919e3e38 ffff8101aae8a930 [1172610.927124] Call Trace: [1172610.932186] [] do_page_fault+0x202/0x890 [1172610.934771] [] do_wait+0x599/0xc90 [1172610.937337] [] __wake_up+0x43/0x70 [1172610.939863] [] default_wake_function+0x0/0x10 [1172610.942391] [] system_call+0x7e/0x83 [1172610.944923] [1172610.947429] su S 0000000000000000 0 7288 7281 [1172610.949987] ffff81004e873e88 0000000000000086 ffff81004e873e50 ffff81011060f530 [1172610.952588] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172610.955214] ffffffff80747dc0 ffff8100867c7738 ffff81004e873e54 ffff81004e873e48 [1172610.955317] Call Trace: [1172610.960412] [] do_wait+0x599/0xc90 [1172610.963007] [] default_wake_function+0x0/0x10 [1172610.965602] [] system_call+0x7e/0x83 [1172610.968186] [1172610.970703] bash S 0000000000000000 0 7289 7288 [1172610.973262] ffff810043dbfdb8 0000000000000082 ffff810043dbfd80 0000000000000fee [1172610.975867] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172610.978487] ffffffff80747dc0 ffff81011060f738 ffff810043dbfd84 ffff810043dbfd78 [1172610.978590] Call Trace: [1172610.983666] [] schedule_timeout+0x95/0xd0 [1172610.986293] [] add_wait_queue+0x1c/0x60 [1172610.988916] [] read_chan+0x228/0x6f0 [1172610.991531] [] default_wake_function+0x0/0x10 [1172610.994156] [] tty_read+0xb0/0x100 [1172610.996766] [] vfs_read+0xc5/0x160 [1172610.999359] [] sys_read+0x53/0x90 [1172611.001936] [] system_call+0x7e/0x83 [1172611.004527] [1172611.007092] strace S 0000000000000000 0 7319 7270 [1172611.009707] ffff8101534a9e88 0000000000000086 ffff8101534a9e50 0000000000000092 [1172611.012368] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172611.015032] ffffffff80747dc0 ffff810139e7c208 ffff8101534a9e54 ffff8101534a9e48 [1172611.015135] Call Trace: [1172611.020281] [] __group_send_sig_info+0x75/0xa0 [1172611.022914] [] do_wait+0x599/0xc90 [1172611.025526] [] kill_pid_info+0x51/0x90 [1172611.028133] [] default_wake_function+0x0/0x10 [1172611.030760] [] system_call+0x7e/0x83 [1172611.033389] [1172611.035983] rm D 0000000000000000 0 7463 7239 [1172611.038664] ffff8101254a3b08 0000000000000086 ffff8101254a3ad0 ffffffff80592c6c [1172611.041422] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172611.044223] ffffffff80747dc0 ffff810145b6b028 ffff8101254a3ad4 ffff8101254a3ac8 [1172611.044326] Call Trace: [1172611.049792] [] __down+0x10c/0x11f [1172611.052604] [] __down+0xa7/0x11f [1172611.055405] [] default_wake_function+0x0/0x10 [1172611.058227] [] __down_failed+0x35/0x3a [1172611.061044] [] xfs_buf_lock+0x3e/0x40 [1172611.063866] [] xfs_getsb+0x15/0x40 [1172611.066676] [] xfs_trans_getsb+0x5a/0xb0 [1172611.069478] [] xfs_trans_apply_sb_deltas+0xf/0x370 [1172611.072281] [] _xfs_trans_commit+0x9e/0x3c0 [1172611.075085] [] __up_read+0x21/0xb0 [1172611.077884] [] xfs_free_extent+0xe2/0x110 [1172611.080690] [] kmem_zone_alloc+0x5c/0xd0 [1172611.083499] [] kmem_zone_alloc+0x5c/0xd0 [1172611.086267] [] kmem_zone_zalloc+0x32/0x50 [1172611.089024] [] xfs_itruncate_finish+0xdb/0x320 [1172611.091768] [] xfs_inactive+0x3f1/0x520 [1172611.094486] [] xfs_fs_clear_inode+0xa9/0x100 [1172611.097203] [] clear_inode+0x58/0xf0 [1172611.099883] [] generic_delete_inode+0xe9/0xf0 [1172611.102557] [] do_unlinkat+0x14a/0x1c0 [1172611.105235] [] error_exit+0x0/0x84 [1172611.107916] [] system_call+0x7e/0x83 [1172611.110592] [1172611.113232] pickup S 0000000000000000 0 7573 30580 [1172611.115922] ffff81021d34be58 0000000000000086 ffff81021d34be20 0000000000000000 [1172611.118661] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172611.121398] ffffffff80747dc0 ffff8101964af028 ffff81021d34be24 ffff81021d34be18 [1172611.121501] Call Trace: [1172611.126801] [] schedule_timeout+0x5f/0xd0 [1172611.129501] [] process_timeout+0x0/0x10 [1172611.132189] [] sys_epoll_wait+0x1bd/0x4e0 [1172611.134877] [] default_wake_function+0x0/0x10 [1172611.137570] [] system_call+0x7e/0x83 [1172611.140247] [1172611.142890] bash D 0000000000000000 0 8896 1 [1172611.145570] ffff8101cdf07ac8 0000000000000046 ffff8101cdf07a90 ffff810226a79800 [1172611.148276] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172611.150995] ffffffff80747dc0 ffff810114417738 ffff8101cdf07a94 ffff8101cdf07a88 [1172611.151098] Call Trace: [1172611.156317] [] wait_for_completion+0x7d/0xc0 [1172611.159006] [] default_wake_function+0x0/0x10 [1172611.161706] [] flush_cpu_workqueue+0x6a/0x90 [1172611.164401] [] wq_barrier_func+0x0/0x10 [1172611.167086] [] flush_workqueue+0x33/0x50 [1172611.169785] [] release_dev+0x44f/0x750 [1172611.172499] [] __sched_text_start+0x166/0x23d [1172611.175232] [] tty_release+0x11/0x20 [1172611.177948] [] __fput+0xb1/0x1a0 [1172611.180652] [] filp_close+0x54/0x90 [1172611.183331] [] put_files_struct+0xb1/0xd0 [1172611.185991] [] do_exit+0x1a9/0x8a0 [1172611.188636] [] __dequeue_signal+0x165/0x1f0 [1172611.191258] [] do_group_exit+0x2c/0x80 [1172611.193857] [] get_signal_to_deliver+0x2c7/0x470 [1172611.196464] [] do_notify_resume+0xc5/0x7a0 [1172611.199077] [] send_signal+0x62/0x1f0 [1172611.201678] [] __group_send_sig_info+0x75/0xa0 [1172611.204289] [] group_send_sig_info+0x6e/0x90 [1172611.206890] [] sys_rt_sigreturn+0x324/0x3d0 [1172611.209498] [] sys_rt_sigaction+0x8e/0xc0 [1172611.212068] [] int_signal+0x12/0x17 [1172611.214618] [1172611.217129] su ? 0000000000000000 0 8903 8896 [1172611.219666] ffff8101e685dee8 0000000000000046 ffff8101e685deb0 0000000000000011 [1172611.222241] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172611.224859] ffffffff80747dc0 ffff8101158e9028 ffff8101e685deb4 ffff8101e685dea8 [1172611.224962] Call Trace: [1172611.230051] [] do_exit+0x5be/0x8a0 [1172611.232666] [] do_group_exit+0x2c/0x80 [1172611.235284] [] system_call+0x7e/0x83 [1172611.237904] [1172611.240493] bash D ffff8101bfb7e600 0 8977 1 [1172611.243132] ffff810106e37ac8 0000000000000046 ffff810106e37c08 ffff810226a79800 [1172611.245831] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172611.248548] ffffffff80747dc0 ffff81018e79d738 0000000000000000 0000000000000000 [1172611.248652] Call Trace: [1172611.253996] [] wait_for_completion+0x7d/0xc0 [1172611.256787] [] default_wake_function+0x0/0x10 [1172611.259581] [] flush_cpu_workqueue+0x6a/0x90 [1172611.262374] [] wq_barrier_func+0x0/0x10 [1172611.265167] [] flush_workqueue+0x33/0x50 [1172611.267952] [] release_dev+0x44f/0x750 [1172611.270733] [] tty_release+0x11/0x20 [1172611.273502] [] __fput+0xb1/0x1a0 [1172611.276259] [] filp_close+0x54/0x90 [1172611.279016] [] put_files_struct+0xb1/0xd0 [1172611.281764] [] do_exit+0x1a9/0x8a0 [1172611.284504] [] __dequeue_signal+0x165/0x1f0 [1172611.287232] [] do_group_exit+0x2c/0x80 [1172611.289939] [] get_signal_to_deliver+0x2c7/0x470 [1172611.292654] [] do_notify_resume+0xc5/0x7a0 [1172611.295338] [] send_signal+0x62/0x1f0 [1172611.298000] [] __group_send_sig_info+0x75/0xa0 [1172611.300678] [] group_send_sig_info+0x6e/0x90 [1172611.303357] [] sys_rt_sigreturn+0x324/0x3d0 [1172611.306036] [] sys_rt_sigaction+0x8e/0xc0 [1172611.308696] [] int_signal+0x12/0x17 [1172611.311338] [1172611.313951] su ? 0000000000000000 0 8984 8977 [1172611.316601] ffff810151203ee8 0000000000000046 ffff810151203eb0 0000000000000011 [1172611.319282] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172611.321981] ffffffff80747dc0 ffff81021a284918 ffff810151203eb4 ffff810151203ea8 [1172611.322083] Call Trace: [1172611.327263] [] do_exit+0x5be/0x8a0 [1172611.329910] [] do_group_exit+0x2c/0x80 [1172611.332547] [] system_call+0x7e/0x83 [1172611.335180] [1172611.337787] sshd S 0000000000000000 0 9072 7582 [1172611.340453] ffff81012ee91bf8 0000000000000082 ffff81012ee91bc0 ffff8101b0d95080 [1172611.343161] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172611.345879] ffffffff80747dc0 ffff8101cb862918 ffff81012ee91bc4 ffff81012ee91bb8 [1172611.345982] Call Trace: [1172611.351307] [] schedule_timeout+0x95/0xd0 [1172611.354060] [] prepare_to_wait+0x23/0x80 [1172611.356818] [] unix_stream_recvmsg+0x386/0x550 [1172611.359584] [] autoremove_wake_function+0x0/0x30 [1172611.362359] [] link_path_walk+0x80/0xf0 [1172611.365126] [] sock_aio_read+0x11b/0x130 [1172611.367882] [] get_unused_fd_flags+0x79/0x120 [1172611.370649] [] do_sync_read+0xd9/0x120 [1172611.373404] [] autoremove_wake_function+0x0/0x30 [1172611.376174] [] pick_next_task_fair+0x42/0x70 [1172611.378939] [] __sched_text_start+0x166/0x23d [1172611.381719] [] do_filp_open+0x3a/0x50 [1172611.384497] [] vfs_read+0x157/0x160 [1172611.387260] [] sys_read+0x53/0x90 [1172611.390008] [] system_call+0x7e/0x83 [1172611.392729] [1172611.395395] sshd S 0000000000000000 0 9074 9072 [1172611.398114] ffff8101677179e8 0000000000000086 ffff8101677179b0 0000000000000002 [1172611.400847] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172611.403591] ffffffff80747dc0 ffff8101cb863738 ffff8101677179b4 ffff8101677179a8 [1172611.403694] Call Trace: [1172611.409063] [] schedule_timeout+0x5f/0xd0 [1172611.411822] [] process_timeout+0x0/0x10 [1172611.414587] [] do_select+0x468/0x560 [1172611.417307] [] __pollwait+0x0/0x130 [1172611.420004] [] default_wake_function+0x0/0x10 [1172611.422704] [] default_wake_function+0x0/0x10 [1172611.425339] [] default_wake_function+0x0/0x10 [1172611.427923] [] default_wake_function+0x0/0x10 [1172611.430472] [] add_partial+0x19/0x60 [1172611.433013] [] __slab_free+0x15d/0x310 [1172611.435547] [] _spin_lock_bh+0x9/0x20 [1172611.438080] [] release_sock+0x13/0xb0 [1172611.440607] [] tcp_recvmsg+0x370/0x940 [1172611.443136] [] sock_common_recvmsg+0x30/0x50 [1172611.445679] [] sock_aio_read+0x11b/0x130 [1172611.448224] [] core_sys_select+0x209/0x300 [1172611.450769] [] autoremove_wake_function+0x0/0x30 [1172611.453330] [] default_wake_function+0x0/0x10 [1172611.455892] [] current_fs_time+0x1e/0x30 [1172611.458451] [] tty_ldisc_deref+0x52/0x80 [1172611.460996] [] sys_select+0xd1/0x1c0 [1172611.463530] [] system_call+0x7e/0x83 [1172611.466056] [1172611.468545] bash S 0000000000000000 0 9075 9074 [1172611.471088] ffff8101a8d01db8 0000000000000086 ffff8101a8d01d80 0000000000000ff5 [1172611.473676] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172611.476296] ffffffff80747dc0 ffff81010f26c208 ffff8101a8d01d84 ffff8101a8d01d78 [1172611.476398] Call Trace: [1172611.481491] [] schedule_timeout+0x95/0xd0 [1172611.484118] [] add_wait_queue+0x1c/0x60 [1172611.486724] [] read_chan+0x228/0x6f0 [1172611.489303] [] default_wake_function+0x0/0x10 [1172611.491890] [] tty_read+0xb0/0x100 [1172611.494437] [] vfs_read+0xc5/0x160 [1172611.496960] [] sys_read+0x53/0x90 [1172611.499471] [] system_call+0x7e/0x83 [1172611.501978] [1172611.504443] sshd S 0000000000000000 0 9477 7582 [1172611.506967] ffff810122bb5bf8 0000000000000082 ffff810122bb5bc0 ffff810102e23600 [1172611.509518] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172611.512071] ffffffff80747dc0 ffff81019b72e918 ffff810122bb5bc4 ffff810122bb5bb8 [1172611.512174] Call Trace: [1172611.517102] [] schedule_timeout+0x95/0xd0 [1172611.519615] [] prepare_to_wait+0x23/0x80 [1172611.522128] [] unix_stream_recvmsg+0x386/0x550 [1172611.524648] [] autoremove_wake_function+0x0/0x30 [1172611.527170] [] link_path_walk+0x80/0xf0 [1172611.529681] [] sock_aio_read+0x11b/0x130 [1172611.532193] [] get_unused_fd_flags+0x79/0x120 [1172611.534712] [] do_sync_read+0xd9/0x120 [1172611.537222] [] autoremove_wake_function+0x0/0x30 [1172611.539741] [] pick_next_task_fair+0x42/0x70 [1172611.542257] [] __sched_text_start+0x166/0x23d [1172611.544776] [] do_filp_open+0x3a/0x50 [1172611.547286] [] vfs_read+0x157/0x160 [1172611.549793] [] sys_read+0x53/0x90 [1172611.552298] [] system_call+0x7e/0x83 [1172611.554805] [1172611.557268] sshd S 0000000000000000 0 9479 9477 [1172611.559791] ffff8101d7f7b9e8 0000000000000082 ffff8101d7f7b9b0 0000000000000002 [1172611.562340] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172611.564892] ffffffff80747dc0 ffff81019b72f738 ffff8101d7f7b9b4 ffff8101d7f7b9a8 [1172611.564995] Call Trace: [1172611.569926] [] schedule_timeout+0x5f/0xd0 [1172611.572443] [] process_timeout+0x0/0x10 [1172611.574957] [] do_select+0x468/0x560 [1172611.577469] [] __pollwait+0x0/0x130 [1172611.579979] [] default_wake_function+0x0/0x10 [1172611.582500] [] default_wake_function+0x0/0x10 [1172611.585023] [] default_wake_function+0x0/0x10 [1172611.587546] [] default_wake_function+0x0/0x10 [1172611.590069] [] skb_copy_datagram_iovec+0x1a1/0x260 [1172611.592602] [] _spin_lock_bh+0x9/0x20 [1172611.595136] [] release_sock+0x13/0xb0 [1172611.597669] [] tcp_recvmsg+0x370/0x940 [1172611.600206] [] sock_common_recvmsg+0x30/0x50 [1172611.602755] [] sock_aio_read+0x11b/0x130 [1172611.605295] [] core_sys_select+0x209/0x300 [1172611.607838] [] autoremove_wake_function+0x0/0x30 [1172611.610396] [] default_wake_function+0x0/0x10 [1172611.612949] [] current_fs_time+0x1e/0x30 [1172611.615496] [] tty_ldisc_deref+0x52/0x80 [1172611.618033] [] sys_select+0xd1/0x1c0 [1172611.620569] [] system_call+0x7e/0x83 [1172611.623100] [1172611.625606] bash S 7fffffffffffffff 0 9480 9479 [1172611.628160] ffff8101d7ed1db8 0000000000000086 000000000000000b 0000000000000ff5 [1172611.630773] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172611.633395] ffffffff80747dc0 ffff8101cb7ee208 0000000000000000 ffff8101657c8018 [1172611.633497] Call Trace: [1172611.638557] [] schedule_timeout+0x95/0xd0 [1172611.641136] [] add_wait_queue+0x1c/0x60 [1172611.643699] [] read_chan+0x228/0x6f0 [1172611.646256] [] default_wake_function+0x0/0x10 [1172611.648829] [] tty_read+0xb0/0x100 [1172611.651389] [] vfs_read+0xc5/0x160 [1172611.653928] [] sys_read+0x53/0x90 [1172611.656463] [] system_call+0x7e/0x83 [1172611.659013] [1172611.661536] su S 0000000000000000 0 9613 1 [1172611.664103] ffff8101c3c57e88 0000000000000086 ffff8101c3c57e50 ffff810117ac0000 [1172611.666717] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172611.669327] ffffffff80747dc0 ffff810106581028 ffff8101c3c57e54 ffff8101c3c57e48 [1172611.669430] Call Trace: [1172611.674472] [] do_wait+0x599/0xc90 [1172611.677029] [] default_wake_function+0x0/0x10 [1172611.679584] [] system_call+0x7e/0x83 [1172611.682132] [1172611.684643] bash S 000000000000000e 0 9614 9613 [1172611.687205] ffff8101ebc27e88 0000000000000082 80000001df3f8065 ffff8101a86f6710 [1172611.689809] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172611.692439] ffffffff80747dc0 ffff810117ac0208 ffff8101ebc27e38 ffff8101786d7a80 [1172611.692542] Call Trace: [1172611.697656] [] do_page_fault+0x202/0x890 [1172611.700289] [] update_curr+0x109/0x120 [1172611.702917] [] do_wait+0x599/0xc90 [1172611.705533] [] __sched_text_start+0x166/0x23d [1172611.708163] [] __wake_up+0x43/0x70 [1172611.710787] [] vfs_ioctl+0x220/0x2c0 [1172611.713422] [] default_wake_function+0x0/0x10 [1172611.716078] [] sys_ioctl+0x49/0x80 [1172611.718716] [] system_call+0x7e/0x83 [1172611.721349] [1172611.723928] bash D ffff81017bb82900 0 9632 1 [1172611.726540] ffff8101514abac8 0000000000000046 ffff8101514abc08 ffff810226a79800 [1172611.729205] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172611.731857] ffffffff80747dc0 ffff810136f4c918 ffff810007139a50 ffff810004cd4a50 [1172611.731960] Call Trace: [1172611.737092] [] wait_for_completion+0x7d/0xc0 [1172611.739754] [] default_wake_function+0x0/0x10 [1172611.742431] [] flush_cpu_workqueue+0x6a/0x90 [1172611.745109] [] wq_barrier_func+0x0/0x10 [1172611.747812] [] flush_workqueue+0x33/0x50 [1172611.750540] [] release_dev+0x44f/0x750 [1172611.753296] [] __sched_text_start+0x166/0x23d [1172611.756058] [] tty_release+0x11/0x20 [1172611.758805] [] __fput+0xb1/0x1a0 [1172611.761549] [] filp_close+0x54/0x90 [1172611.764295] [] put_files_struct+0xb1/0xd0 [1172611.767039] [] do_exit+0x1a9/0x8a0 [1172611.769781] [] __dequeue_signal+0x165/0x1f0 [1172611.772533] [] do_group_exit+0x2c/0x80 [1172611.775279] [] get_signal_to_deliver+0x2c7/0x470 [1172611.778032] [] do_notify_resume+0xc5/0x7a0 [1172611.780784] [] send_signal+0x62/0x1f0 [1172611.783537] [] __group_send_sig_info+0x75/0xa0 [1172611.786308] [] group_send_sig_info+0x6e/0x90 [1172611.789086] [] sys_rt_sigreturn+0x324/0x3d0 [1172611.791858] [] sys_rt_sigaction+0x8e/0xc0 [1172611.794615] [] int_signal+0x12/0x17 [1172611.797334] [1172611.799993] su ? 0000000000000000 0 9639 9632 [1172611.802704] ffff8101b98afee8 0000000000000046 ffff8101b98afeb0 0000000000000011 [1172611.805431] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172611.808166] ffffffff80747dc0 ffff8101243a7028 ffff8101b98afeb4 ffff8101b98afea8 [1172611.808269] Call Trace: [1172611.813600] [] do_exit+0x5be/0x8a0 [1172611.816333] [] do_group_exit+0x2c/0x80 [1172611.819057] [] system_call+0x7e/0x83 [1172611.821794] [1172611.824519] mdadm D 0000000000000000 0 9783 9614 [1172611.827312] ffff8101aea09a18 0000000000000082 ffff8101aea099e0 ffff8101aea09998 [1172611.830142] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172611.832994] ffffffff80747dc0 ffff8101a86f6918 ffff8101aea099e4 ffff8101aea099d8 [1172611.833098] Call Trace: [1172611.838611] [] __wake_up+0x43/0x70 [1172611.841427] [] sync_page+0x0/0x50 [1172611.844200] [] io_schedule+0x28/0x40 [1172611.846961] [] sync_page+0x3b/0x50 [1172611.849716] [] __wait_on_bit_lock+0x4a/0x80 [1172611.852476] [] __lock_page+0x5f/0x70 [1172611.855220] [] wake_bit_function+0x0/0x30 [1172611.857968] [] pagevec_lookup_tag+0x1a/0x30 [1172611.860699] [] write_cache_pages+0x191/0x340 [1172611.863407] [] __writepage+0x0/0x30 [1172611.866104] [] do_writepages+0x20/0x40 [1172611.868763] [] __writeback_single_inode+0x2d9/0x400 [1172611.871430] [] __wake_up+0x43/0x70 [1172611.874087] [] sync_sb_inodes+0x21a/0x300 [1172611.876755] [] sync_inodes_sb+0xa1/0xc0 [1172611.879405] [] __fsync_super+0xb/0x70 [1172611.882049] [] fsync_super+0x9/0x20 [1172611.884692] [] fsync_bdev+0x26/0x60 [1172611.887318] [] blkdev_ioctl+0x1c7/0x7a0 [1172611.889939] [] handle_mm_fault+0x1a1/0x8a0 [1172611.892573] [] md_open+0x6a/0x90 [1172611.895186] [] blkdev_open+0x0/0x90 [1172611.897799] [] __up_read+0x21/0xb0 [1172611.900374] [] do_page_fault+0x202/0x890 [1172611.902936] [] blkdev_open+0x3c/0x90 [1172611.905489] [] block_ioctl+0x1b/0x30 [1172611.907994] [] do_ioctl+0x2f/0xa0 [1172611.910470] [] vfs_ioctl+0x220/0x2c0 [1172611.912938] [] sys_ioctl+0x49/0x80 [1172611.915381] [] error_exit+0x0/0x84 [1172611.917816] [] system_call+0x7e/0x83 [1172611.920256] [1172611.922661] sshd S 0000000000000000 0 9793 7582 [1172611.925122] ffff8101a7fabbf8 0000000000000086 ffff8101a7fabbc0 ffff8101cd1f1600 [1172611.927626] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172611.930154] ffffffff80747dc0 ffff8101536d0918 ffff8101a7fabbc4 ffff8101a7fabbb8 [1172611.930258] Call Trace: [1172611.935198] [] schedule_timeout+0x95/0xd0 [1172611.937753] [] prepare_to_wait+0x23/0x80 [1172611.940308] [] unix_stream_recvmsg+0x386/0x550 [1172611.942871] [] autoremove_wake_function+0x0/0x30 [1172611.945439] [] link_path_walk+0x80/0xf0 [1172611.947997] [] sock_aio_read+0x11b/0x130 [1172611.950553] [] get_unused_fd_flags+0x79/0x120 [1172611.953111] [] do_sync_read+0xd9/0x120 [1172611.955662] [] autoremove_wake_function+0x0/0x30 [1172611.958225] [] pick_next_task_fair+0x42/0x70 [1172611.960800] [] __sched_text_start+0x166/0x23d [1172611.963379] [] do_filp_open+0x3a/0x50 [1172611.965946] [] vfs_read+0x157/0x160 [1172611.968507] [] sys_read+0x53/0x90 [1172611.971029] [] system_call+0x7e/0x83 [1172611.973523] [1172611.975978] sshd S 0000000000000000 0 9795 9793 [1172611.978461] ffff81021f41d9e8 0000000000000082 ffff81021f41d9b0 0000000000000002 [1172611.980981] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172611.983533] ffffffff80747dc0 ffff8101536d1738 ffff81021f41d9b4 ffff81021f41d9a8 [1172611.983636] Call Trace: [1172611.988598] [] schedule_timeout+0x5f/0xd0 [1172611.991157] [] process_timeout+0x0/0x10 [1172611.993698] [] do_select+0x468/0x560 [1172611.996208] [] __pollwait+0x0/0x130 [1172611.998705] [] default_wake_function+0x0/0x10 [1172612.001178] [] default_wake_function+0x0/0x10 [1172612.003617] [] default_wake_function+0x0/0x10 [1172612.006034] [] default_wake_function+0x0/0x10 [1172612.008424] [] add_partial+0x19/0x60 [1172612.010813] [] __slab_free+0x15d/0x310 [1172612.013194] [] _spin_lock_bh+0x9/0x20 [1172612.015567] [] release_sock+0x13/0xb0 [1172612.017935] [] tcp_recvmsg+0x370/0x940 [1172612.020296] [] sock_common_recvmsg+0x30/0x50 [1172612.022667] [] sock_aio_read+0x11b/0x130 [1172612.025029] [] core_sys_select+0x209/0x300 [1172612.027401] [] autoremove_wake_function+0x0/0x30 [1172612.029774] [] default_wake_function+0x0/0x10 [1172612.032143] [] current_fs_time+0x1e/0x30 [1172612.034510] [] tty_ldisc_deref+0x52/0x80 [1172612.036880] [] sys_select+0xd1/0x1c0 [1172612.039245] [] system_call+0x7e/0x83 [1172612.041607] [1172612.043951] bash S 000000000000000e 0 9796 9795 [1172612.046358] ffff81013de09e88 0000000000000086 8000000104441065 ffff8101125a2710 [1172612.048809] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172612.051282] ffffffff80747dc0 ffff81014da80208 ffff81013de09e38 ffff8101eab7d5e8 [1172612.051385] Call Trace: [1172612.056215] [] do_page_fault+0x202/0x890 [1172612.058715] [] update_curr+0x109/0x120 [1172612.061212] [] do_wait+0x599/0xc90 [1172612.063714] [] __sched_text_start+0x166/0x23d [1172612.066238] [] __wake_up+0x43/0x70 [1172612.068760] [] vfs_ioctl+0x220/0x2c0 [1172612.071291] [] default_wake_function+0x0/0x10 [1172612.073847] [] sys_ioctl+0x49/0x80 [1172612.076399] [] system_call+0x7e/0x83 [1172612.078935] [1172612.081438] su S 0000000000000000 0 9804 9796 [1172612.083976] ffff810184fdbe88 0000000000000082 ffff810184fdbe50 ffff810120808000 [1172612.086547] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172612.089126] ffffffff80747dc0 ffff8101125a2918 ffff810184fdbe54 ffff810184fdbe48 [1172612.089229] Call Trace: [1172612.094170] [] do_wait+0x599/0xc90 [1172612.096703] [] default_wake_function+0x0/0x10 [1172612.099244] [] system_call+0x7e/0x83 [1172612.101772] [1172612.104264] bash S 0000000000000000 0 9805 9804 [1172612.106820] ffff8101e88f7db8 0000000000000082 ffff8101e88f7d80 0000000000000ff9 [1172612.109419] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172612.112036] ffffffff80747dc0 ffff810120808208 ffff8101e88f7d84 ffff8101e88f7d78 [1172612.112139] Call Trace: [1172612.117216] [] schedule_timeout+0x95/0xd0 [1172612.119833] [] add_wait_queue+0x1c/0x60 [1172612.122446] [] read_chan+0x228/0x6f0 [1172612.125058] [] default_wake_function+0x0/0x10 [1172612.127700] [] tty_read+0xb0/0x100 [1172612.130342] [] vfs_read+0xc5/0x160 [1172612.132958] [] sys_read+0x53/0x90 [1172612.135554] [] system_call+0x7e/0x83 [1172612.138121] [1172612.140634] smtpd S 0000000000000000 0 9847 30580 [1172612.143203] ffff8101a6e25e58 0000000000000086 ffff8101a6e25e20 ffff81022583d318 [1172612.145786] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172612.148371] ffffffff80747dc0 ffff8101d859a208 ffff8101a6e25e24 ffff8101a6e25e18 [1172612.148474] Call Trace: [1172612.153514] [] schedule_timeout+0x5f/0xd0 [1172612.156129] [] process_timeout+0x0/0x10 [1172612.158743] [] sys_epoll_wait+0x1bd/0x4e0 [1172612.161385] [] default_wake_function+0x0/0x10 [1172612.164066] [] system_call+0x7e/0x83 [1172612.166774] [1172612.169446] smtpd S ffff81022583d318 0 9963 30580 [1172612.172187] ffff8101c5f69eb8 0000000000000082 0000000000000000 ffffffff00000001 [1172612.174990] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172612.177819] ffffffff80747dc0 ffff810105b04918 0000000000000000 000000008bcec672 [1172612.177922] Call Trace: [1172612.183467] [] ns_to_timeval+0x9/0x40 [1172612.186321] [] flock_lock_file_wait+0x14d/0x300 [1172612.189190] [] autoremove_wake_function+0x0/0x30 [1172612.192057] [] sys_flock+0x16b/0x180 [1172612.194913] [] system_call+0x7e/0x83 [1172612.197759] [1172612.200578] cleanup S 0000000000000000 0 9966 30580 [1172612.203466] ffff8101b50b7e58 0000000000000082 ffff8101b50b7e20 ffff8101a496a828 [1172612.206409] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172612.209358] ffffffff80747dc0 ffff810132074208 ffff8101b50b7e24 ffff8101b50b7e18 [1172612.209460] Call Trace: [1172612.215203] [] schedule_timeout+0x5f/0xd0 [1172612.218127] [] process_timeout+0x0/0x10 [1172612.221046] [] sys_epoll_wait+0x1bd/0x4e0 [1172612.223934] [] default_wake_function+0x0/0x10 [1172612.226813] [] system_call+0x7e/0x83 [1172612.229693] [1172612.232543] local S 0000000000000000 0 9967 30580 [1172612.235450] ffff8101c7bf9e58 0000000000000086 ffff8101c7bf9e20 0000000000000000 [1172612.238401] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 [1172612.241380] ffffffff80747dc0 ffff8101b11bd738 ffff8101c7bf9e24 ffff8101c7bf9e18 [1172612.241483] Call Trace: [1172612.247296] [] schedule_timeout+0x5f/0xd0 [1172612.250292] [] process_timeout+0x0/0x10 [1172612.253278] [] sys_epoll_wait+0x1bd/0x4e0 [1172612.256265] [] default_wake_function+0x0/0x10 [1172612.259237] [] system_call+0x7e/0x83 [1172612.262188] From owner-xfs@oss.sgi.com Sun Nov 4 06:59:29 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 04 Nov 2007 06:59:32 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-0.7 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_33, SPF_HELO_PASS autolearn=no version=3.3.0-r574664 Received: from lucidpixels.com (lucidpixels.com [75.144.35.66]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA4ExSPN016066 for ; Sun, 4 Nov 2007 06:59:28 -0800 Received: by lucidpixels.com (Postfix, from userid 1001) id 2D7B51C00026A; Sun, 4 Nov 2007 09:59:32 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by lucidpixels.com (Postfix) with ESMTP id 2822F4019B27; Sun, 4 Nov 2007 09:59:32 -0500 (EST) Date: Sun, 4 Nov 2007 09:59:32 -0500 (EST) From: Justin Piszcz X-X-Sender: jpiszcz@p34.internal.lan To: Michael Tokarev cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, xfs@oss.sgi.com Subject: Re: 2.6.23.1: mdadm/raid5 hung/d-state In-Reply-To: <472DDD78.7040002@msgid.tls.msk.ru> Message-ID: References: <472DBF8C.2060508@msgid.tls.msk.ru> <472DDD78.7040002@msgid.tls.msk.ru> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: ClamAV 0.91.2/4672/Sun Nov 4 03:38:42 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13544 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jpiszcz@lucidpixels.com Precedence: bulk X-list: xfs On Sun, 4 Nov 2007, Michael Tokarev wrote: > Justin Piszcz wrote: >> On Sun, 4 Nov 2007, Michael Tokarev wrote: > [] >>> The next time you come across something like that, do a SysRq-T dump and >>> post that. It shows a stack trace of all processes - and in particular, >>> where exactly each task is stuck. > >> Yes I got it before I rebooted, ran that and then dmesg > file. >> >> Here it is: >> >> [1172609.665902] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 >> [1172609.668768] ffffffff80747dc0 ffff81015c3aa918 ffff810091c899b4 ffff810091c899a8 > > That's only partial list. All the kernel threads - which are most important > in this context - aren't shown. You ran out of dmesg buffer, and the most > interesting entries was at the beginning. If your /var/log partition is > working, the stuff should be in /var/log/kern.log or equivalent. If it's > not working, there is a way to capture the info still, by stopping syslogd, > cat'ing /proc/kmsg to some tmpfs file and scp'ing it elsewhere. > > /mjt > Will do that the next time it happens, thanks. From owner-xfs@oss.sgi.com Sun Nov 4 07:21:03 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 04 Nov 2007 07:21:07 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-6.5 required=5.0 tests=BAYES_00,J_CHICKENPOX_33, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.0-r574664 Received: from hobbit.corpit.ru (hobbit.corpit.ru [81.13.94.6]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA4FL0Fc021016 for ; Sun, 4 Nov 2007 07:21:03 -0800 Received: from [192.168.1.200] (mjt.ppp.tls.msk.ru [192.168.1.200]) by hobbit.corpit.ru (Postfix) with ESMTP id 3593335610; Sun, 4 Nov 2007 17:55:53 +0300 (MSK) (envelope-from mjt@tls.msk.ru) Message-ID: <472DDD78.7040002@msgid.tls.msk.ru> Date: Sun, 04 Nov 2007 17:55:52 +0300 From: Michael Tokarev Organization: Telecom Service, JSC User-Agent: Icedove 1.5.0.12 (X11/20070607) MIME-Version: 1.0 To: Justin Piszcz CC: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, xfs@oss.sgi.com Subject: Re: 2.6.23.1: mdadm/raid5 hung/d-state References: <472DBF8C.2060508@msgid.tls.msk.ru> In-Reply-To: X-Enigmail-Version: 0.94.2.0 OpenPGP: id=4F9CF57E Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4672/Sun Nov 4 03:38:42 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13545 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: mjt@tls.msk.ru Precedence: bulk X-list: xfs Justin Piszcz wrote: > On Sun, 4 Nov 2007, Michael Tokarev wrote: [] >> The next time you come across something like that, do a SysRq-T dump and >> post that. It shows a stack trace of all processes - and in particular, >> where exactly each task is stuck. > Yes I got it before I rebooted, ran that and then dmesg > file. > > Here it is: > > [1172609.665902] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 > [1172609.668768] ffffffff80747dc0 ffff81015c3aa918 ffff810091c899b4 ffff810091c899a8 That's only partial list. All the kernel threads - which are most important in this context - aren't shown. You ran out of dmesg buffer, and the most interesting entries was at the beginning. If your /var/log partition is working, the stuff should be in /var/log/kern.log or equivalent. If it's not working, there is a way to capture the info still, by stopping syslogd, cat'ing /proc/kmsg to some tmpfs file and scp'ing it elsewhere. /mjt From owner-xfs@oss.sgi.com Sun Nov 4 10:35:35 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 04 Nov 2007 10:35:43 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=0.7 required=5.0 tests=BAYES_50,J_CHICKENPOX_33, J_CHICKENPOX_35,J_CHICKENPOX_36,J_CHICKENPOX_39,MIME_8BIT_HEADER autolearn=no version=3.3.0-r574664 Received: from rayleigh.systella.fr (rayleigh.systella.fr [213.41.184.253]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA4IZT0g020746 for ; Sun, 4 Nov 2007 10:35:32 -0800 Received: from [192.168.0.83] (fermat.systella.fr [192.168.0.83]) (authenticated bits=0) by rayleigh.systella.fr (8.14.1/8.14.1/Debian-9) with ESMTP id lA4IHsMT029212 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Sun, 4 Nov 2007 19:18:00 +0100 Message-ID: <472E0CD2.7050702@systella.fr> Date: Sun, 04 Nov 2007 19:17:54 +0100 From: =?ISO-8859-1?Q?BERTRAND_Jo=EBl?= User-Agent: Mozilla/5.0 (X11; U; Linux sparc64; fr-FR; rv:1.8.1.6) Gecko/20070802 Iceape/1.1.4 (Debian-1.1.4-1) MIME-Version: 1.0 To: Michael Tokarev CC: Justin Piszcz , linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, xfs@oss.sgi.com Subject: Re: 2.6.23.1: mdadm/raid5 hung/d-state References: <472DBF8C.2060508@msgid.tls.msk.ru> <472DDD78.7040002@msgid.tls.msk.ru> In-Reply-To: <472DDD78.7040002@msgid.tls.msk.ru> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-3.1.8 (rayleigh.systella.fr [192.168.254.1]); Sun, 04 Nov 2007 19:18:02 +0100 (CET) X-Scanned-By: MIMEDefang on 192.168.254.1 X-Virus-Scanned: ClamAV 0.91.2/4672/Sun Nov 4 03:38:42 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13546 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: joel.bertrand@systella.fr Precedence: bulk X-list: xfs Michael Tokarev wrote: > Justin Piszcz wrote: >> On Sun, 4 Nov 2007, Michael Tokarev wrote: > [] >>> The next time you come across something like that, do a SysRq-T dump and >>> post that. It shows a stack trace of all processes - and in particular, >>> where exactly each task is stuck. > >> Yes I got it before I rebooted, ran that and then dmesg > file. >> >> Here it is: >> >> [1172609.665902] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 >> [1172609.668768] ffffffff80747dc0 ffff81015c3aa918 ffff810091c899b4 ffff810091c899a8 > > That's only partial list. All the kernel threads - which are most important > in this context - aren't shown. You ran out of dmesg buffer, and the most > interesting entries was at the beginning. If your /var/log partition is > working, the stuff should be in /var/log/kern.log or equivalent. If it's > not working, there is a way to capture the info still, by stopping syslogd, > cat'ing /proc/kmsg to some tmpfs file and scp'ing it elsewhere. I have reported some days ago the same bug . I can reproduced it without any trouble :-(. Configuration : 2.6.23 linux kernel with iscsi-target on sparc64/smp (sun4v). Following output was crated by echo t > /proc/sysrq-trigger and echo x > /proc/sysrq-trigger. I is a and paste from /var/log/syslog and I hope I haven't done any mistake... Nov 4 18:55:56 poulenc kernel: SysRq : Show State Nov 4 18:55:56 poulenc kernel: task PC stack pid father Nov 4 18:55:56 poulenc kernel: init S 00000000004c7d68 0 1 0 Nov 4 18:55:56 poulenc kernel: Call Trace: Nov 4 18:55:56 poulenc kernel: [000000000067ec0c] schedule_timeout+0x54/0xc0 Nov 4 18:55:56 poulenc kernel: [00000000004c7d68] do_select+0x3d0/0x420 Nov 4 18:55:56 poulenc kernel: [00000000004eccb8] compat_core_sys_select+0x160/0x200 Nov 4 18:55:56 poulenc kernel: [00000000004eef74] compat_sys_select+0xbc/0x1a0 Nov 4 18:55:56 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:55:56 poulenc kernel: [00000000000150b8] 0x150c0 Nov 4 18:55:56 poulenc kernel: kthreadd S 00000000004273d0 0 2 0 Nov 4 18:55:56 poulenc kernel: Call Trace: Nov 4 18:55:56 poulenc kernel: [0000000000478fe8] kthreadd+0x1b0/0x1c0 Nov 4 18:55:56 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:56 poulenc kernel: [000000000067d404] rest_init+0x2c/0x60 Nov 4 18:55:56 poulenc kernel: migration/0 S 0000000000478ce0 0 3 2 Nov 4 18:55:56 poulenc kernel: Call Trace: Nov 4 18:55:56 poulenc kernel: [000000000045e60c] migration_thread+0x174/0x360 Nov 4 18:55:56 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:56 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:56 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:55:56 poulenc kernel: ksoftirqd/0 S 0000000000478ce0 0 4 2 Nov 4 18:55:56 poulenc kernel: Call Trace: Nov 4 18:55:56 poulenc kernel: [00000000004683c0] ksoftirqd+0xa8/0xc0 Nov 4 18:55:56 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:56 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:57 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:55:57 poulenc kernel: watchdog/0 S 0000000000478ce0 0 5 2 Nov 4 18:55:57 poulenc kernel: Call Trace: Nov 4 18:55:57 poulenc kernel: [000000000048f9e0] watchdog+0x48/0x80 Nov 4 18:55:57 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:57 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:57 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:55:57 poulenc kernel: migration/1 S 0000000000478ce0 0 6 2 Nov 4 18:55:57 poulenc kernel: Call Trace: Nov 4 18:55:57 poulenc kernel: [000000000045e60c] migration_thread+0x174/0x360 Nov 4 18:55:57 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:57 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:57 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:55:57 poulenc kernel: ksoftirqd/1 S 0000000000478ce0 0 7 2 Nov 4 18:55:57 poulenc kernel: Call Trace: Nov 4 18:55:57 poulenc kernel: [00000000004683c0] ksoftirqd+0xa8/0xc0 Nov 4 18:55:57 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:57 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:57 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:55:57 poulenc kernel: watchdog/1 S 0000000000478ce0 0 8 2 Nov 4 18:55:57 poulenc kernel: Call Trace: Nov 4 18:55:57 poulenc kernel: [000000000048f9e0] watchdog+0x48/0x80 Nov 4 18:55:57 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:57 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:57 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:55:57 poulenc kernel: migration/2 S 0000000000478ce0 0 9 2 Nov 4 18:55:57 poulenc kernel: Call Trace: Nov 4 18:55:57 poulenc kernel: [000000000045e60c] migration_thread+0x174/0x360 Nov 4 18:55:57 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:57 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:57 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:55:57 poulenc kernel: ksoftirqd/2 S 0000000000478ce0 0 10 2 Nov 4 18:55:57 poulenc kernel: Call Trace: Nov 4 18:55:57 poulenc kernel: [00000000004683c0] ksoftirqd+0xa8/0xc0 Nov 4 18:55:57 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:57 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:57 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:55:57 poulenc kernel: watchdog/2 S 0000000000478ce0 0 11 2 Nov 4 18:55:57 poulenc kernel: Call Trace: Nov 4 18:55:57 poulenc kernel: [000000000048f9e0] watchdog+0x48/0x80 Nov 4 18:55:57 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:57 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:57 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:55:57 poulenc kernel: migration/3 R running task 0 12 2 Nov 4 18:55:57 poulenc kernel: ksoftirqd/3 S 0000000000478ce0 0 13 2 Nov 4 18:55:57 poulenc kernel: Call Trace: Nov 4 18:55:57 poulenc kernel: [00000000004683c0] ksoftirqd+0xa8/0xc0 Nov 4 18:55:57 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:57 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:58 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:55:58 poulenc kernel: watchdog/3 S 0000000000478ce0 0 14 2 Nov 4 18:55:58 poulenc kernel: Call Trace: Nov 4 18:55:58 poulenc kernel: [000000000048f9e0] watchdog+0x48/0x80 Nov 4 18:55:58 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:58 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:58 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:55:58 poulenc kernel: migration/4 S 0000000000478ce0 0 15 2 Nov 4 18:55:58 poulenc kernel: Call Trace: Nov 4 18:55:58 poulenc kernel: [000000000045e60c] migration_thread+0x174/0x360 Nov 4 18:55:58 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:58 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:58 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:55:58 poulenc kernel: ksoftirqd/4 S 0000000000478ce0 0 16 2 Nov 4 18:55:58 poulenc kernel: Call Trace: Nov 4 18:55:58 poulenc kernel: [00000000004683c0] ksoftirqd+0xa8/0xc0 Nov 4 18:55:58 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:58 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:58 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:55:58 poulenc kernel: watchdog/4 S 0000000000478ce0 0 17 2 Nov 4 18:55:58 poulenc kernel: Call Trace: Nov 4 18:55:58 poulenc kernel: [000000000048f9e0] watchdog+0x48/0x80 Nov 4 18:55:58 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:58 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:58 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:55:58 poulenc kernel: migration/5 S 0000000000478ce0 0 18 2 Nov 4 18:55:58 poulenc kernel: Call Trace: Nov 4 18:55:58 poulenc kernel: [000000000045e60c] migration_thread+0x174/0x360 Nov 4 18:55:58 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:58 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:58 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:55:58 poulenc kernel: ksoftirqd/5 S 0000000000478ce0 0 19 2 Nov 4 18:55:58 poulenc kernel: Call Trace: Nov 4 18:55:58 poulenc kernel: [00000000004683c0] ksoftirqd+0xa8/0xc0 Nov 4 18:55:58 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:58 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:58 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:55:58 poulenc kernel: watchdog/5 S 0000000000478ce0 0 20 2 Nov 4 18:55:58 poulenc kernel: Call Trace: Nov 4 18:55:58 poulenc kernel: [000000000048f9e0] watchdog+0x48/0x80 Nov 4 18:55:58 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:58 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:58 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:55:58 poulenc kernel: migration/6 S 0000000000478ce0 0 21 2 Nov 4 18:55:58 poulenc kernel: Call Trace: Nov 4 18:55:58 poulenc kernel: [000000000045e60c] migration_thread+0x174/0x360 Nov 4 18:55:58 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:58 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:59 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:55:59 poulenc kernel: ksoftirqd/6 S 0000000000478ce0 0 22 2 Nov 4 18:55:59 poulenc kernel: Call Trace: Nov 4 18:55:59 poulenc kernel: [00000000004683c0] ksoftirqd+0xa8/0xc0 Nov 4 18:55:59 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:59 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:59 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:55:59 poulenc kernel: watchdog/6 S 0000000000478ce0 0 23 2 Nov 4 18:55:59 poulenc kernel: Call Trace: Nov 4 18:55:59 poulenc kernel: [000000000048f9e0] watchdog+0x48/0x80 Nov 4 18:55:59 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:59 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:59 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:55:59 poulenc kernel: migration/7 S 0000000000478ce0 0 24 2 Nov 4 18:55:59 poulenc kernel: Call Trace: Nov 4 18:55:59 poulenc kernel: [000000000045e60c] migration_thread+0x174/0x360 Nov 4 18:55:59 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:59 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:59 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:55:59 poulenc kernel: ksoftirqd/7 S 0000000000478ce0 0 25 2 Nov 4 18:55:59 poulenc kernel: Call Trace: Nov 4 18:55:59 poulenc kernel: [00000000004683c0] ksoftirqd+0xa8/0xc0 Nov 4 18:55:59 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:59 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:59 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:55:59 poulenc kernel: watchdog/7 S 0000000000478ce0 0 26 2 Nov 4 18:55:59 poulenc kernel: Call Trace: Nov 4 18:55:59 poulenc kernel: [000000000048f9e0] watchdog+0x48/0x80 Nov 4 18:55:59 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:59 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:59 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:55:59 poulenc kernel: migration/8 S 0000000000478ce0 0 27 2 Nov 4 18:55:59 poulenc kernel: Call Trace: Nov 4 18:55:59 poulenc kernel: [000000000045e60c] migration_thread+0x174/0x360 Nov 4 18:55:59 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:59 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:59 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:55:59 poulenc kernel: ksoftirqd/8 S 0000000000478ce0 0 28 2 Nov 4 18:55:59 poulenc kernel: Call Trace: Nov 4 18:55:59 poulenc kernel: [00000000004683c0] ksoftirqd+0xa8/0xc0 Nov 4 18:55:59 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:59 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:59 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:55:59 poulenc kernel: watchdog/8 S 0000000000478ce0 0 29 2 Nov 4 18:55:59 poulenc kernel: Call Trace: Nov 4 18:55:59 poulenc kernel: [000000000048f9e0] watchdog+0x48/0x80 Nov 4 18:55:59 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:55:59 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:55:59 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:00 poulenc kernel: migration/9 S 0000000000478ce0 0 30 2 Nov 4 18:56:00 poulenc kernel: Call Trace: Nov 4 18:56:00 poulenc kernel: [000000000045e60c] migration_thread+0x174/0x360 Nov 4 18:56:00 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:00 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:00 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:00 poulenc kernel: ksoftirqd/9 S 0000000000478ce0 0 31 2 Nov 4 18:56:00 poulenc kernel: Call Trace: Nov 4 18:56:00 poulenc kernel: [00000000004683c0] ksoftirqd+0xa8/0xc0 Nov 4 18:56:00 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:00 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:00 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:00 poulenc kernel: watchdog/9 S 0000000000478ce0 0 32 2 Nov 4 18:56:00 poulenc kernel: Call Trace: Nov 4 18:56:00 poulenc kernel: [000000000048f9e0] watchdog+0x48/0x80 Nov 4 18:56:00 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:00 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:00 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:00 poulenc kernel: migration/10 S 0000000000478ce0 0 33 2 Nov 4 18:56:00 poulenc kernel: Call Trace: Nov 4 18:56:00 poulenc kernel: [000000000045e60c] migration_thread+0x174/0x360 Nov 4 18:56:00 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:00 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:00 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:00 poulenc kernel: ksoftirqd/10 S 0000000000478ce0 0 34 2 Nov 4 18:56:00 poulenc kernel: Call Trace: Nov 4 18:56:00 poulenc kernel: [00000000004683c0] ksoftirqd+0xa8/0xc0 Nov 4 18:56:00 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:00 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:00 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:00 poulenc kernel: watchdog/10 S 0000000000478ce0 0 35 2 Nov 4 18:56:00 poulenc kernel: Call Trace: Nov 4 18:56:00 poulenc kernel: [000000000048f9e0] watchdog+0x48/0x80 Nov 4 18:56:00 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:00 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:00 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:00 poulenc kernel: migration/11 S 0000000000478ce0 0 36 2 Nov 4 18:56:00 poulenc kernel: Call Trace: Nov 4 18:56:00 poulenc kernel: [000000000045e60c] migration_thread+0x174/0x360 Nov 4 18:56:00 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:00 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:00 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:00 poulenc kernel: ksoftirqd/11 S 0000000000478ce0 0 37 2 Nov 4 18:56:00 poulenc kernel: Call Trace: Nov 4 18:56:00 poulenc kernel: [00000000004683c0] ksoftirqd+0xa8/0xc0 Nov 4 18:56:00 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:01 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:01 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:01 poulenc kernel: watchdog/11 S 0000000000478ce0 0 38 2 Nov 4 18:56:01 poulenc kernel: Call Trace: Nov 4 18:56:01 poulenc kernel: [000000000048f9e0] watchdog+0x48/0x80 Nov 4 18:56:01 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:01 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:01 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:01 poulenc kernel: migration/12 S 0000000000478ce0 0 39 2 Nov 4 18:56:01 poulenc kernel: Call Trace: Nov 4 18:56:01 poulenc kernel: [000000000045e60c] migration_thread+0x174/0x360 Nov 4 18:56:01 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:01 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:01 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:01 poulenc kernel: ksoftirqd/12 S 0000000000478ce0 0 40 2 Nov 4 18:56:01 poulenc kernel: Call Trace: Nov 4 18:56:01 poulenc kernel: [00000000004683c0] ksoftirqd+0xa8/0xc0 Nov 4 18:56:01 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:01 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:01 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:01 poulenc kernel: watchdog/12 S 0000000000478ce0 0 41 2 Nov 4 18:56:01 poulenc kernel: Call Trace: Nov 4 18:56:01 poulenc kernel: [000000000048f9e0] watchdog+0x48/0x80 Nov 4 18:56:01 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:01 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:01 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:01 poulenc kernel: migration/13 S 0000000000478ce0 0 42 2 Nov 4 18:56:01 poulenc kernel: Call Trace: Nov 4 18:56:01 poulenc kernel: [000000000045e60c] migration_thread+0x174/0x360 Nov 4 18:56:01 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:01 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:01 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:01 poulenc kernel: ksoftirqd/13 S 0000000000478ce0 0 43 2 Nov 4 18:56:01 poulenc kernel: Call Trace: Nov 4 18:56:01 poulenc kernel: [00000000004683c0] ksoftirqd+0xa8/0xc0 Nov 4 18:56:01 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:01 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:01 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:01 poulenc kernel: watchdog/13 S 0000000000478ce0 0 44 2 Nov 4 18:56:01 poulenc kernel: Call Trace: Nov 4 18:56:01 poulenc kernel: [000000000048f9e0] watchdog+0x48/0x80 Nov 4 18:56:01 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:01 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:01 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:02 poulenc kernel: migration/14 S 0000000000478ce0 0 45 2 Nov 4 18:56:02 poulenc kernel: Call Trace: Nov 4 18:56:02 poulenc kernel: [000000000045e60c] migration_thread+0x174/0x360 Nov 4 18:56:02 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:02 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:02 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:02 poulenc kernel: ksoftirqd/14 S 0000000000478ce0 0 46 2 Nov 4 18:56:02 poulenc kernel: Call Trace: Nov 4 18:56:02 poulenc kernel: [00000000004683c0] ksoftirqd+0xa8/0xc0 Nov 4 18:56:02 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:02 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:02 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:02 poulenc kernel: watchdog/14 S 0000000000478ce0 0 47 2 Nov 4 18:56:02 poulenc kernel: Call Trace: Nov 4 18:56:02 poulenc kernel: [000000000048f9e0] watchdog+0x48/0x80 Nov 4 18:56:02 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:02 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:02 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:02 poulenc kernel: migration/15 S 0000000000478ce0 0 48 2 Nov 4 18:56:02 poulenc kernel: Call Trace: Nov 4 18:56:02 poulenc kernel: [000000000045e60c] migration_thread+0x174/0x360 Nov 4 18:56:02 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:02 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:02 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:02 poulenc kernel: ksoftirqd/15 S 0000000000478ce0 0 49 2 Nov 4 18:56:02 poulenc kernel: Call Trace: Nov 4 18:56:02 poulenc kernel: [00000000004683c0] ksoftirqd+0xa8/0xc0 Nov 4 18:56:02 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:02 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:02 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:02 poulenc kernel: watchdog/15 S 0000000000478ce0 0 50 2 Nov 4 18:56:02 poulenc kernel: Call Trace: Nov 4 18:56:02 poulenc kernel: [000000000048f9e0] watchdog+0x48/0x80 Nov 4 18:56:02 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:02 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:02 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:02 poulenc kernel: migration/16 S 0000000000478ce0 0 51 2 Nov 4 18:56:03 poulenc kernel: Call Trace: Nov 4 18:56:03 poulenc kernel: [000000000045e60c] migration_thread+0x174/0x360 Nov 4 18:56:03 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:03 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:03 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:03 poulenc kernel: ksoftirqd/16 S 0000000000478ce0 0 52 2 Nov 4 18:56:03 poulenc kernel: Call Trace: Nov 4 18:56:03 poulenc kernel: [00000000004683c0] ksoftirqd+0xa8/0xc0 Nov 4 18:56:03 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:03 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:03 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:03 poulenc kernel: watchdog/16 S 0000000000478ce0 0 53 2 Nov 4 18:56:03 poulenc kernel: Call Trace: Nov 4 18:56:03 poulenc kernel: [000000000048f9e0] watchdog+0x48/0x80 Nov 4 18:56:03 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:03 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:03 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:03 poulenc kernel: migration/17 R running task 0 54 2 Nov 4 18:56:03 poulenc kernel: ksoftirqd/17 R running task 0 55 2 Nov 4 18:56:03 poulenc kernel: watchdog/17 S 0000000000478ce0 0 56 2 Nov 4 18:56:03 poulenc kernel: Call Trace: Nov 4 18:56:03 poulenc kernel: [000000000048f9e0] watchdog+0x48/0x80 Nov 4 18:56:03 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:03 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:03 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:03 poulenc kernel: migration/18 S 0000000000478ce0 0 57 2 Nov 4 18:56:03 poulenc kernel: Call Trace: Nov 4 18:56:03 poulenc kernel: [000000000045e60c] migration_thread+0x174/0x360 Nov 4 18:56:03 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:03 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:03 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:03 poulenc kernel: ksoftirqd/18 S 0000000000478ce0 0 58 2 Nov 4 18:56:03 poulenc kernel: Call Trace: Nov 4 18:56:03 poulenc kernel: [00000000004683c0] ksoftirqd+0xa8/0xc0 Nov 4 18:56:03 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:04 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:04 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:04 poulenc kernel: watchdog/18 S 0000000000478ce0 0 59 2 Nov 4 18:56:04 poulenc kernel: Call Trace: Nov 4 18:56:04 poulenc kernel: [000000000048f9e0] watchdog+0x48/0x80 Nov 4 18:56:04 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:04 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:04 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:04 poulenc kernel: migration/19 S 0000000000478ce0 0 60 2 Nov 4 18:56:04 poulenc kernel: Call Trace: Nov 4 18:56:04 poulenc kernel: [000000000045e60c] migration_thread+0x174/0x360 Nov 4 18:56:04 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:04 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:04 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:04 poulenc kernel: ksoftirqd/19 S 0000000000478ce0 0 61 2 Nov 4 18:56:04 poulenc kernel: Call Trace: Nov 4 18:56:04 poulenc kernel: [00000000004683c0] ksoftirqd+0xa8/0xc0 Nov 4 18:56:04 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:04 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:04 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:04 poulenc kernel: watchdog/19 S 0000000000478ce0 0 62 2 Nov 4 18:56:04 poulenc kernel: Call Trace: Nov 4 18:56:04 poulenc kernel: [000000000048f9e0] watchdog+0x48/0x80 Nov 4 18:56:04 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:04 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:04 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:04 poulenc kernel: migration/20 S 0000000000478ce0 0 63 2 Nov 4 18:56:04 poulenc kernel: Call Trace: Nov 4 18:56:04 poulenc kernel: [000000000045e60c] migration_thread+0x174/0x360 Nov 4 18:56:04 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:04 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:04 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:04 poulenc kernel: ksoftirqd/20 S 0000000000478ce0 0 64 2 Nov 4 18:56:04 poulenc kernel: Call Trace: Nov 4 18:56:04 poulenc kernel: [00000000004683c0] ksoftirqd+0xa8/0xc0 Nov 4 18:56:04 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:04 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:04 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:04 poulenc kernel: watchdog/20 S 0000000000478ce0 0 65 2 Nov 4 18:56:04 poulenc kernel: Call Trace: Nov 4 18:56:04 poulenc kernel: [000000000048f9e0] watchdog+0x48/0x80 Nov 4 18:56:04 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:04 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:04 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:04 poulenc kernel: migration/21 S 0000000000478ce0 0 66 2 Nov 4 18:56:04 poulenc kernel: Call Trace: Nov 4 18:56:04 poulenc kernel: [000000000045e60c] migration_thread+0x174/0x360 Nov 4 18:56:04 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:04 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:05 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:05 poulenc kernel: ksoftirqd/21 S 0000000000478ce0 0 67 2 Nov 4 18:56:05 poulenc kernel: Call Trace: Nov 4 18:56:05 poulenc kernel: [00000000004683c0] ksoftirqd+0xa8/0xc0 Nov 4 18:56:05 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:05 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:05 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:05 poulenc kernel: watchdog/21 S 0000000000478ce0 0 68 2 Nov 4 18:56:05 poulenc kernel: Call Trace: Nov 4 18:56:05 poulenc kernel: [000000000048f9e0] watchdog+0x48/0x80 Nov 4 18:56:05 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:05 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:05 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:05 poulenc kernel: migration/22 S 0000000000478ce0 0 69 2 Nov 4 18:56:05 poulenc kernel: Call Trace: Nov 4 18:56:05 poulenc kernel: [000000000045e60c] migration_thread+0x174/0x360 Nov 4 18:56:05 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:05 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:05 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:05 poulenc kernel: ksoftirqd/22 S 0000000000478ce0 0 70 2 Nov 4 18:56:05 poulenc kernel: Call Trace: Nov 4 18:56:05 poulenc kernel: [00000000004683c0] ksoftirqd+0xa8/0xc0 Nov 4 18:56:05 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:05 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:05 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:05 poulenc kernel: watchdog/22 S 0000000000478ce0 0 71 2 Nov 4 18:56:05 poulenc kernel: Call Trace: Nov 4 18:56:05 poulenc kernel: [000000000048f9e0] watchdog+0x48/0x80 Nov 4 18:56:05 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:05 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:05 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:05 poulenc kernel: migration/23 S 0000000000478ce0 0 72 2 Nov 4 18:56:05 poulenc kernel: Call Trace: Nov 4 18:56:05 poulenc kernel: [000000000045e60c] migration_thread+0x174/0x360 Nov 4 18:56:05 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:05 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:05 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:05 poulenc kernel: ksoftirqd/23 S 0000000000478ce0 0 73 2 Nov 4 18:56:05 poulenc kernel: Call Trace: Nov 4 18:56:05 poulenc kernel: [00000000004683c0] ksoftirqd+0xa8/0xc0 Nov 4 18:56:05 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:05 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:06 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:06 poulenc kernel: watchdog/23 S 0000000000478ce0 0 74 2 Nov 4 18:56:06 poulenc kernel: Call Trace: Nov 4 18:56:06 poulenc kernel: [000000000048f9e0] watchdog+0x48/0x80 Nov 4 18:56:06 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:06 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:06 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:06 poulenc kernel: events/0 S 0000000000478ce0 0 75 2 Nov 4 18:56:06 poulenc kernel: Call Trace: Nov 4 18:56:06 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:06 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:06 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:06 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:06 poulenc kernel: events/1 S 0000000000478ce0 0 76 2 Nov 4 18:56:06 poulenc kernel: Call Trace: Nov 4 18:56:06 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:06 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:06 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:06 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:06 poulenc kernel: events/2 S 0000000000478ce0 0 77 2 Nov 4 18:56:06 poulenc kernel: Call Trace: Nov 4 18:56:06 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:06 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:06 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:06 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:06 poulenc kernel: events/3 R running task 0 78 2 Nov 4 18:56:06 poulenc kernel: events/4 S 0000000000478ce0 0 79 2 Nov 4 18:56:06 poulenc kernel: Call Trace: Nov 4 18:56:06 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:06 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:06 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:06 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:06 poulenc kernel: events/5 S 0000000000478ce0 0 80 2 Nov 4 18:56:06 poulenc kernel: Call Trace: Nov 4 18:56:06 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:06 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:06 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:06 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:06 poulenc kernel: events/6 S 0000000000478ce0 0 81 2 Nov 4 18:56:06 poulenc kernel: Call Trace: Nov 4 18:56:06 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:06 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:06 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:06 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:06 poulenc kernel: events/7 S 0000000000478ce0 0 82 2 Nov 4 18:56:06 poulenc kernel: Call Trace: Nov 4 18:56:06 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:06 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:06 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:06 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:07 poulenc kernel: events/8 S 0000000000478ce0 0 83 2 Nov 4 18:56:07 poulenc kernel: Call Trace: Nov 4 18:56:07 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:07 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:07 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:07 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:07 poulenc kernel: events/9 S 0000000000478ce0 0 84 2 Nov 4 18:56:07 poulenc kernel: Call Trace: Nov 4 18:56:07 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:07 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:07 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:07 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:07 poulenc kernel: events/10 S 0000000000478ce0 0 85 2 Nov 4 18:56:07 poulenc kernel: Call Trace: Nov 4 18:56:07 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:07 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:07 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:07 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:07 poulenc kernel: events/11 S 0000000000478ce0 0 86 2 Nov 4 18:56:07 poulenc kernel: Call Trace: Nov 4 18:56:07 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:07 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:07 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:07 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:07 poulenc kernel: events/12 S 0000000000478ce0 0 87 2 Nov 4 18:56:07 poulenc kernel: Call Trace: Nov 4 18:56:07 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:07 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:07 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:07 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:07 poulenc kernel: events/13 S 0000000000478ce0 0 88 2 Nov 4 18:56:07 poulenc kernel: Call Trace: Nov 4 18:56:07 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:07 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:07 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:07 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:07 poulenc kernel: events/14 S 0000000000478ce0 0 89 2 Nov 4 18:56:07 poulenc kernel: Call Trace: Nov 4 18:56:07 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:07 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:07 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:07 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:07 poulenc kernel: events/15 S 0000000000478ce0 0 90 2 Nov 4 18:56:07 poulenc kernel: Call Trace: Nov 4 18:56:07 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:07 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:07 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:07 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:07 poulenc kernel: events/16 S 0000000000478ce0 0 91 2 Nov 4 18:56:08 poulenc kernel: Call Trace: Nov 4 18:56:08 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:08 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:08 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:08 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:08 poulenc kernel: events/17 S 0000000000478ce0 0 92 2 Nov 4 18:56:08 poulenc kernel: Call Trace: Nov 4 18:56:08 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:08 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:08 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:08 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:08 poulenc kernel: events/18 S 0000000000478ce0 0 93 2 Nov 4 18:56:08 poulenc kernel: Call Trace: Nov 4 18:56:08 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:08 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:08 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:08 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:08 poulenc kernel: events/19 S 0000000000478ce0 0 94 2 Nov 4 18:56:08 poulenc kernel: Call Trace: Nov 4 18:56:08 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:08 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:08 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:08 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:08 poulenc kernel: events/20 S 0000000000478ce0 0 95 2 Nov 4 18:56:08 poulenc kernel: Call Trace: Nov 4 18:56:08 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:08 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:08 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:08 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:08 poulenc kernel: events/21 S 0000000000478ce0 0 96 2 Nov 4 18:56:08 poulenc kernel: Call Trace: Nov 4 18:56:08 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:08 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:08 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:08 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:08 poulenc kernel: events/22 S 0000000000478ce0 0 97 2 Nov 4 18:56:08 poulenc kernel: Call Trace: Nov 4 18:56:08 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:08 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:08 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:08 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:08 poulenc kernel: events/23 S 0000000000478ce0 0 98 2 Nov 4 18:56:08 poulenc kernel: Call Trace: Nov 4 18:56:08 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:08 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:08 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:08 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:08 poulenc kernel: khelper S 0000000000478ce0 0 99 2 Nov 4 18:56:08 poulenc kernel: Call Trace: Nov 4 18:56:08 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:08 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:08 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:08 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:09 poulenc kernel: kblockd/0 S 0000000000478ce0 0 247 2 Nov 4 18:56:09 poulenc kernel: Call Trace: Nov 4 18:56:09 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:09 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:09 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:09 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:09 poulenc kernel: kblockd/1 S 0000000000478ce0 0 248 2 Nov 4 18:56:09 poulenc kernel: Call Trace: Nov 4 18:56:09 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:09 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:09 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:09 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:09 poulenc kernel: kblockd/2 S 0000000000478ce0 0 249 2 Nov 4 18:56:09 poulenc kernel: Call Trace: Nov 4 18:56:09 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:09 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:09 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:09 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:09 poulenc kernel: kblockd/3 R running task 0 250 2 Nov 4 18:56:09 poulenc kernel: kblockd/4 S 0000000000478ce0 0 251 2 Nov 4 18:56:09 poulenc kernel: Call Trace: Nov 4 18:56:09 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:09 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:09 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:09 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:09 poulenc kernel: kblockd/5 S 0000000000478ce0 0 252 2 Nov 4 18:56:09 poulenc kernel: Call Trace: Nov 4 18:56:09 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:09 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:09 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:09 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:09 poulenc kernel: kblockd/6 S 0000000000478ce0 0 253 2 Nov 4 18:56:09 poulenc kernel: Call Trace: Nov 4 18:56:09 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:09 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:09 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:10 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:10 poulenc kernel: kblockd/7 S 0000000000478ce0 0 254 2 Nov 4 18:56:10 poulenc kernel: Call Trace: Nov 4 18:56:10 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:10 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:10 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:10 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:10 poulenc kernel: kblockd/8 S 0000000000478ce0 0 255 2 Nov 4 18:56:10 poulenc kernel: Call Trace: Nov 4 18:56:10 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:10 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:10 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:10 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:10 poulenc kernel: kblockd/9 S 0000000000478ce0 0 256 2 Nov 4 18:56:10 poulenc kernel: Call Trace: Nov 4 18:56:10 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:10 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:10 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:10 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:10 poulenc kernel: kblockd/10 S 0000000000478ce0 0 257 2 Nov 4 18:56:10 poulenc kernel: Call Trace: Nov 4 18:56:10 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:10 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:10 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:10 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:10 poulenc kernel: kblockd/11 S 0000000000478ce0 0 258 2 Nov 4 18:56:10 poulenc kernel: Call Trace: Nov 4 18:56:10 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:10 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:10 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:10 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:10 poulenc kernel: kblockd/12 S 0000000000478ce0 0 259 2 Nov 4 18:56:10 poulenc kernel: Call Trace: Nov 4 18:56:10 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:10 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:10 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:10 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:10 poulenc kernel: kblockd/13 S 0000000000478ce0 0 260 2 Nov 4 18:56:10 poulenc kernel: Call Trace: Nov 4 18:56:10 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:10 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:10 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:10 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:10 poulenc kernel: kblockd/14 S 0000000000478ce0 0 261 2 Nov 4 18:56:10 poulenc kernel: Call Trace: Nov 4 18:56:10 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:10 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:10 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:11 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:11 poulenc kernel: kblockd/15 S 0000000000478ce0 0 262 2 Nov 4 18:56:11 poulenc kernel: Call Trace: Nov 4 18:56:11 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:11 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:11 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:11 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:11 poulenc kernel: kblockd/16 S 0000000000478ce0 0 263 2 Nov 4 18:56:11 poulenc kernel: Call Trace: Nov 4 18:56:11 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:11 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:11 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:11 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:11 poulenc kernel: kblockd/17 S 0000000000478ce0 0 264 2 Nov 4 18:56:11 poulenc kernel: Call Trace: Nov 4 18:56:11 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:11 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:11 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:11 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:11 poulenc kernel: kblockd/18 S 0000000000478ce0 0 265 2 Nov 4 18:56:11 poulenc kernel: Call Trace: Nov 4 18:56:11 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:11 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:11 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:11 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:11 poulenc kernel: kblockd/19 S 0000000000478ce0 0 266 2 Nov 4 18:56:11 poulenc kernel: Call Trace: Nov 4 18:56:11 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:11 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:11 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:11 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:11 poulenc kernel: kblockd/20 S 0000000000478ce0 0 267 2 Nov 4 18:56:11 poulenc kernel: Call Trace: Nov 4 18:56:11 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:11 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:11 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:11 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:11 poulenc kernel: kblockd/21 S 0000000000478ce0 0 268 2 Nov 4 18:56:11 poulenc kernel: Call Trace: Nov 4 18:56:11 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:12 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:12 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:12 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:12 poulenc kernel: kblockd/22 S 0000000000478ce0 0 269 2 Nov 4 18:56:12 poulenc kernel: Call Trace: Nov 4 18:56:12 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:12 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:12 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:12 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:12 poulenc kernel: kblockd/23 S 0000000000478ce0 0 270 2 Nov 4 18:56:12 poulenc kernel: Call Trace: Nov 4 18:56:12 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:12 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:12 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:12 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:12 poulenc kernel: pdflush S 0000000000478ce0 0 294 2 Nov 4 18:56:12 poulenc kernel: Call Trace: Nov 4 18:56:12 poulenc kernel: [000000000049a420] pdflush+0xc8/0x200 Nov 4 18:56:12 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:12 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:12 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:12 poulenc kernel: pdflush S 0000000000478ce0 0 295 2 Nov 4 18:56:12 poulenc kernel: Call Trace: Nov 4 18:56:12 poulenc kernel: [000000000049a420] pdflush+0xc8/0x200 Nov 4 18:56:12 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:12 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:12 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:12 poulenc kernel: kswapd0 S 0000000000478ce0 0 296 2 Nov 4 18:56:12 poulenc kernel: Call Trace: Nov 4 18:56:12 poulenc kernel: [000000000049e778] kswapd+0x540/0x560 Nov 4 18:56:12 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:12 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:12 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:12 poulenc kernel: aio/0 S 0000000000478ce0 0 297 2 Nov 4 18:56:12 poulenc kernel: Call Trace: Nov 4 18:56:12 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:12 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:12 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:12 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:12 poulenc kernel: aio/1 S 0000000000478ce0 0 298 2 Nov 4 18:56:12 poulenc kernel: Call Trace: Nov 4 18:56:12 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:13 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:13 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:13 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:13 poulenc kernel: aio/2 S 0000000000478ce0 0 299 2 Nov 4 18:56:13 poulenc kernel: Call Trace: Nov 4 18:56:13 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:13 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:13 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:13 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:13 poulenc kernel: aio/3 S 0000000000478ce0 0 300 2 Nov 4 18:56:13 poulenc kernel: Call Trace: Nov 4 18:56:13 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:13 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:13 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:13 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:13 poulenc kernel: aio/4 S 0000000000478ce0 0 301 2 Nov 4 18:56:13 poulenc kernel: Call Trace: Nov 4 18:56:13 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:13 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:13 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:13 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:13 poulenc kernel: aio/5 S 0000000000478ce0 0 302 2 Nov 4 18:56:13 poulenc kernel: Call Trace: Nov 4 18:56:13 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:13 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:13 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:13 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:13 poulenc kernel: aio/6 S 0000000000478ce0 0 303 2 Nov 4 18:56:13 poulenc kernel: Call Trace: Nov 4 18:56:13 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:13 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:13 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:13 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:13 poulenc kernel: aio/7 S 0000000000478ce0 0 304 2 Nov 4 18:56:13 poulenc kernel: Call Trace: Nov 4 18:56:13 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:13 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:13 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:13 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:14 poulenc kernel: aio/8 S 0000000000478ce0 0 305 2 Nov 4 18:56:14 poulenc kernel: Call Trace: Nov 4 18:56:14 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:14 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:14 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:14 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:14 poulenc kernel: aio/9 S 0000000000478ce0 0 306 2 Nov 4 18:56:14 poulenc kernel: Call Trace: Nov 4 18:56:14 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:14 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:14 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:14 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:14 poulenc kernel: aio/10 S 0000000000478ce0 0 307 2 Nov 4 18:56:14 poulenc kernel: Call Trace: Nov 4 18:56:14 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:14 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:14 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:14 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:14 poulenc kernel: aio/11 S 0000000000478ce0 0 308 2 Nov 4 18:56:14 poulenc kernel: Call Trace: Nov 4 18:56:14 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:14 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:14 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:14 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:14 poulenc kernel: aio/12 S 0000000000478ce0 0 309 2 Nov 4 18:56:14 poulenc kernel: Call Trace: Nov 4 18:56:15 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:15 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:15 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:15 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:15 poulenc kernel: aio/13 S 0000000000478ce0 0 310 2 Nov 4 18:56:15 poulenc kernel: Call Trace: Nov 4 18:56:15 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:15 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:15 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:15 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:15 poulenc kernel: aio/14 S 0000000000478ce0 0 311 2 Nov 4 18:56:15 poulenc kernel: Call Trace: Nov 4 18:56:15 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:15 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:15 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:15 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:15 poulenc kernel: aio/15 S 0000000000478ce0 0 312 2 Nov 4 18:56:15 poulenc kernel: Call Trace: Nov 4 18:56:15 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:15 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:15 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:15 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:15 poulenc kernel: aio/16 S 0000000000478ce0 0 313 2 Nov 4 18:56:15 poulenc kernel: Call Trace: Nov 4 18:56:15 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:15 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:15 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:15 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:15 poulenc kernel: aio/17 S 0000000000478ce0 0 314 2 Nov 4 18:56:15 poulenc kernel: Call Trace: Nov 4 18:56:15 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:15 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:15 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:15 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:15 poulenc kernel: aio/18 S 0000000000478ce0 0 315 2 Nov 4 18:56:15 poulenc kernel: Call Trace: Nov 4 18:56:15 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:15 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:15 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:16 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:16 poulenc kernel: aio/19 S 0000000000478ce0 0 316 2 Nov 4 18:56:16 poulenc kernel: Call Trace: Nov 4 18:56:16 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:16 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:16 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:16 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:16 poulenc kernel: aio/20 S 0000000000478ce0 0 317 2 Nov 4 18:56:16 poulenc kernel: Call Trace: Nov 4 18:56:16 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:16 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:16 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:16 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:16 poulenc kernel: aio/21 S 0000000000478ce0 0 318 2 Nov 4 18:56:16 poulenc kernel: Call Trace: Nov 4 18:56:16 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:16 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:16 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:16 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:16 poulenc kernel: aio/22 S 0000000000478ce0 0 319 2 Nov 4 18:56:16 poulenc kernel: Call Trace: Nov 4 18:56:16 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:16 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:16 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:16 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:16 poulenc kernel: aio/23 S 0000000000478ce0 0 320 2 Nov 4 18:56:16 poulenc kernel: Call Trace: Nov 4 18:56:16 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:16 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:16 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:16 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:16 poulenc kernel: scsi_tgtd/0 S 0000000000478ce0 0 911 2 Nov 4 18:56:16 poulenc kernel: Call Trace: Nov 4 18:56:16 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:16 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:16 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:17 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:17 poulenc kernel: scsi_tgtd/1 S 0000000000478ce0 0 912 2 Nov 4 18:56:17 poulenc kernel: Call Trace: Nov 4 18:56:17 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:17 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:17 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:17 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:17 poulenc kernel: scsi_tgtd/2 S 0000000000478ce0 0 913 2 Nov 4 18:56:17 poulenc kernel: Call Trace: Nov 4 18:56:17 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:17 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:17 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:17 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:17 poulenc kernel: scsi_tgtd/3 S 0000000000478ce0 0 914 2 Nov 4 18:56:17 poulenc kernel: Call Trace: Nov 4 18:56:17 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:17 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:17 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:17 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:17 poulenc kernel: scsi_tgtd/4 S 0000000000478ce0 0 915 2 Nov 4 18:56:17 poulenc kernel: Call Trace: Nov 4 18:56:17 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:17 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:17 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:17 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:17 poulenc kernel: scsi_tgtd/5 S 0000000000478ce0 0 916 2 Nov 4 18:56:17 poulenc kernel: Call Trace: Nov 4 18:56:17 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:17 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:17 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:17 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:17 poulenc kernel: scsi_tgtd/6 S 0000000000478ce0 0 917 2 Nov 4 18:56:17 poulenc kernel: Call Trace: Nov 4 18:56:17 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:17 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:17 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:17 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:17 poulenc kernel: scsi_tgtd/7 S 0000000000478ce0 0 918 2 Nov 4 18:56:17 poulenc kernel: Call Trace: Nov 4 18:56:17 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:17 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:17 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:17 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:17 poulenc kernel: scsi_tgtd/8 S 0000000000478ce0 0 919 2 Nov 4 18:56:17 poulenc kernel: Call Trace: Nov 4 18:56:17 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:17 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:17 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:17 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:18 poulenc kernel: scsi_tgtd/9 S 0000000000478ce0 0 920 2 Nov 4 18:56:18 poulenc kernel: Call Trace: Nov 4 18:56:18 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:18 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:18 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:18 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:18 poulenc kernel: scsi_tgtd/10 S 0000000000478ce0 0 921 2 Nov 4 18:56:18 poulenc kernel: Call Trace: Nov 4 18:56:18 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:18 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:18 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:18 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:18 poulenc kernel: scsi_tgtd/11 S 0000000000478ce0 0 922 2 Nov 4 18:56:18 poulenc kernel: Call Trace: Nov 4 18:56:18 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:18 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:18 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:18 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:18 poulenc kernel: scsi_tgtd/12 S 0000000000478ce0 0 923 2 Nov 4 18:56:18 poulenc kernel: Call Trace: Nov 4 18:56:18 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:18 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:18 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:18 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:18 poulenc kernel: scsi_tgtd/13 S 0000000000478ce0 0 924 2 Nov 4 18:56:18 poulenc kernel: Call Trace: Nov 4 18:56:18 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:18 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:18 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:18 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:18 poulenc kernel: scsi_tgtd/14 S 0000000000478ce0 0 925 2 Nov 4 18:56:18 poulenc kernel: Call Trace: Nov 4 18:56:18 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:18 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:18 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:18 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:18 poulenc kernel: scsi_tgtd/15 S 0000000000478ce0 0 926 2 Nov 4 18:56:18 poulenc kernel: Call Trace: Nov 4 18:56:18 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:18 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:18 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:18 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:18 poulenc kernel: scsi_tgtd/16 S 0000000000478ce0 0 927 2 Nov 4 18:56:19 poulenc kernel: Call Trace: Nov 4 18:56:19 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:19 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:19 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:19 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:19 poulenc kernel: scsi_tgtd/17 S 0000000000478ce0 0 928 2 Nov 4 18:56:19 poulenc kernel: Call Trace: Nov 4 18:56:19 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:19 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:19 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:19 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:19 poulenc kernel: scsi_tgtd/18 S 0000000000478ce0 0 929 2 Nov 4 18:56:19 poulenc kernel: Call Trace: Nov 4 18:56:19 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:19 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:19 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:19 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:19 poulenc kernel: scsi_tgtd/19 S 0000000000478ce0 0 930 2 Nov 4 18:56:19 poulenc kernel: Call Trace: Nov 4 18:56:19 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:19 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:19 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:19 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:19 poulenc kernel: scsi_tgtd/20 S 0000000000478ce0 0 931 2 Nov 4 18:56:19 poulenc kernel: Call Trace: Nov 4 18:56:19 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:19 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:19 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:19 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:19 poulenc kernel: scsi_tgtd/21 S 0000000000478ce0 0 932 2 Nov 4 18:56:19 poulenc kernel: Call Trace: Nov 4 18:56:19 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:19 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:19 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:19 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:19 poulenc kernel: scsi_tgtd/22 S 0000000000478ce0 0 933 2 Nov 4 18:56:19 poulenc kernel: Call Trace: Nov 4 18:56:20 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:20 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:20 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:20 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:20 poulenc kernel: scsi_tgtd/23 S 0000000000478ce0 0 934 2 Nov 4 18:56:20 poulenc kernel: Call Trace: Nov 4 18:56:20 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:20 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:20 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:20 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:20 poulenc kernel: scsi_eh_0 S 0000000000478ce0 0 947 2 Nov 4 18:56:20 poulenc kernel: Call Trace: Nov 4 18:56:20 poulenc kernel: [00000000005ae0a0] scsi_error_handler+0x48/0x5a0 Nov 4 18:56:20 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:20 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:20 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:20 poulenc kernel: md0_raid1 S 00000000005f2ee8 0 991 2 Nov 4 18:56:20 poulenc kernel: Call Trace: Nov 4 18:56:20 poulenc kernel: [000000000067ec30] schedule_timeout+0x78/0xc0 Nov 4 18:56:20 poulenc kernel: [00000000005f2ee8] md_thread+0xf0/0x140 Nov 4 18:56:20 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:20 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:20 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:20 poulenc kernel: kjournald S 0000000000478ce0 0 993 2 Nov 4 18:56:20 poulenc kernel: Call Trace: Nov 4 18:56:20 poulenc kernel: [000000000052da18] kjournald+0x1c0/0x1e0 Nov 4 18:56:20 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:20 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:20 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:20 poulenc kernel: udevd S 00000000004c7d68 0 1091 1 Nov 4 18:56:20 poulenc kernel: Call Trace: Nov 4 18:56:20 poulenc kernel: [000000000067ec30] schedule_timeout+0x78/0xc0 Nov 4 18:56:20 poulenc kernel: [00000000004c7d68] do_select+0x3d0/0x420 Nov 4 18:56:20 poulenc kernel: [00000000004eccb8] compat_core_sys_select+0x160/0x200 Nov 4 18:56:20 poulenc kernel: [00000000004eeee4] compat_sys_select+0x2c/0x1a0 Nov 4 18:56:20 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:20 poulenc kernel: [0000000000013590] 0x13598 Nov 4 18:56:20 poulenc kernel: scsi_eh_1 S 0000000000478ce0 0 1985 2 Nov 4 18:56:20 poulenc kernel: Call Trace: Nov 4 18:56:20 poulenc kernel: [00000000005ae0a0] scsi_error_handler+0x48/0x5a0 Nov 4 18:56:20 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:20 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:20 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:20 poulenc kernel: scsi_eh_2 S 0000000000478ce0 0 2093 2 Nov 4 18:56:20 poulenc kernel: Call Trace: Nov 4 18:56:21 poulenc kernel: [00000000005ae0a0] scsi_error_handler+0x48/0x5a0 Nov 4 18:56:21 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:21 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:21 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:21 poulenc kernel: ksnapd S 0000000000478ce0 0 2718 2 Nov 4 18:56:21 poulenc kernel: Call Trace: Nov 4 18:56:21 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:21 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:21 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:21 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:21 poulenc kernel: md6_raid1 S 00000000005f2ee8 0 2731 2 Nov 4 18:56:21 poulenc kernel: Call Trace: Nov 4 18:56:21 poulenc kernel: [000000000067ec30] schedule_timeout+0x78/0xc0 Nov 4 18:56:21 poulenc kernel: [00000000005f2ee8] md_thread+0xf0/0x140 Nov 4 18:56:21 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:21 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:21 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:21 poulenc kernel: md1_raid1 S 00000000005f2ee8 0 2748 2 Nov 4 18:56:21 poulenc kernel: Call Trace: Nov 4 18:56:21 poulenc kernel: [000000000067ec30] schedule_timeout+0x78/0xc0 Nov 4 18:56:21 poulenc kernel: [00000000005f2ee8] md_thread+0xf0/0x140 Nov 4 18:56:21 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:21 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:21 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:21 poulenc kernel: md2_raid1 S 00000000005f2ee8 0 2753 2 Nov 4 18:56:21 poulenc kernel: Call Trace: Nov 4 18:56:21 poulenc kernel: [000000000067ec30] schedule_timeout+0x78/0xc0 Nov 4 18:56:21 poulenc kernel: [00000000005f2ee8] md_thread+0xf0/0x140 Nov 4 18:56:21 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:21 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:21 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:21 poulenc kernel: md3_raid1 S 00000000005f2ee8 0 2758 2 Nov 4 18:56:21 poulenc kernel: Call Trace: Nov 4 18:56:21 poulenc kernel: [000000000067ec30] schedule_timeout+0x78/0xc0 Nov 4 18:56:21 poulenc kernel: [00000000005f2ee8] md_thread+0xf0/0x140 Nov 4 18:56:21 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:21 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:21 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:22 poulenc kernel: md4_raid1 S 00000000005f2ee8 0 2763 2 Nov 4 18:56:22 poulenc kernel: Call Trace: Nov 4 18:56:22 poulenc kernel: [000000000067ec30] schedule_timeout+0x78/0xc0 Nov 4 18:56:22 poulenc kernel: [00000000005f2ee8] md_thread+0xf0/0x140 Nov 4 18:56:22 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:22 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:22 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:22 poulenc kernel: md5_raid1 S 00000000005f2ee8 0 2768 2 Nov 4 18:56:22 poulenc kernel: Call Trace: Nov 4 18:56:22 poulenc kernel: [000000000067ec30] schedule_timeout+0x78/0xc0 Nov 4 18:56:22 poulenc kernel: [00000000005f2ee8] md_thread+0xf0/0x140 Nov 4 18:56:22 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:22 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:22 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:22 poulenc kernel: kjournald S 0000000000478ce0 0 2857 2 Nov 4 18:56:22 poulenc kernel: Call Trace: Nov 4 18:56:22 poulenc kernel: [000000000052da18] kjournald+0x1c0/0x1e0 Nov 4 18:56:22 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:22 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:22 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:22 poulenc kernel: kjournald S 0000000000478ce0 0 2870 2 Nov 4 18:56:22 poulenc kernel: Call Trace: Nov 4 18:56:22 poulenc kernel: [000000000052da18] kjournald+0x1c0/0x1e0 Nov 4 18:56:22 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:22 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:22 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:22 poulenc kernel: kjournald S 0000000000478ce0 0 2871 2 Nov 4 18:56:22 poulenc kernel: Call Trace: Nov 4 18:56:22 poulenc kernel: [000000000052da18] kjournald+0x1c0/0x1e0 Nov 4 18:56:22 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:22 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:22 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:22 poulenc kernel: kjournald S 0000000000478ce0 0 2872 2 Nov 4 18:56:22 poulenc kernel: Call Trace: Nov 4 18:56:22 poulenc kernel: [000000000052da18] kjournald+0x1c0/0x1e0 Nov 4 18:56:22 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:22 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:22 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:23 poulenc kernel: kjournald S 0000000000478ce0 0 2873 2 Nov 4 18:56:23 poulenc kernel: Call Trace: Nov 4 18:56:23 poulenc kernel: [000000000052da18] kjournald+0x1c0/0x1e0 Nov 4 18:56:23 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:23 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:23 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:23 poulenc kernel: portmap S 00000000004c776c 0 2994 1 Nov 4 18:56:23 poulenc kernel: Call Trace: Nov 4 18:56:23 poulenc kernel: [000000000067ec0c] schedule_timeout+0x54/0xc0 Nov 4 18:56:23 poulenc kernel: [00000000004c776c] do_sys_poll+0x234/0x400 Nov 4 18:56:23 poulenc kernel: [00000000004c7960] sys_poll+0x28/0x60 Nov 4 18:56:23 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:23 poulenc kernel: [00000000700025b8] 0x700025c0 Nov 4 18:56:23 poulenc kernel: rpc.statd S 00000000004c7d68 0 3006 1 Nov 4 18:56:23 poulenc kernel: Call Trace: Nov 4 18:56:23 poulenc kernel: [000000000067ec30] schedule_timeout+0x78/0xc0 Nov 4 18:56:23 poulenc kernel: [00000000004c7d68] do_select+0x3d0/0x420 Nov 4 18:56:23 poulenc kernel: [00000000004eccb8] compat_core_sys_select+0x160/0x200 Nov 4 18:56:23 poulenc kernel: [00000000004eeee4] compat_sys_select+0x2c/0x1a0 Nov 4 18:56:23 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:23 poulenc kernel: [00000000000143f4] 0x143fc Nov 4 18:56:23 poulenc kernel: rpciod/0 S 0000000000478ce0 0 3035 2 Nov 4 18:56:23 poulenc kernel: Call Trace: Nov 4 18:56:23 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:23 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:23 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:23 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:23 poulenc kernel: rpciod/1 S 0000000000478ce0 0 3036 2 Nov 4 18:56:23 poulenc kernel: Call Trace: Nov 4 18:56:23 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:23 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:23 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:23 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:23 poulenc kernel: rpciod/2 S 0000000000478ce0 0 3037 2 Nov 4 18:56:23 poulenc kernel: Call Trace: Nov 4 18:56:23 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:23 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:23 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:23 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:23 poulenc kernel: rpciod/3 S 0000000000478ce0 0 3038 2 Nov 4 18:56:23 poulenc kernel: Call Trace: Nov 4 18:56:23 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:23 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:23 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:23 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:24 poulenc kernel: rpciod/4 S 0000000000478ce0 0 3039 2 Nov 4 18:56:24 poulenc kernel: Call Trace: Nov 4 18:56:24 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:24 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:24 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:24 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:24 poulenc kernel: rpciod/5 S 0000000000478ce0 0 3040 2 Nov 4 18:56:24 poulenc kernel: Call Trace: Nov 4 18:56:24 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:24 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:24 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:24 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:24 poulenc kernel: rpciod/6 S 0000000000478ce0 0 3041 2 Nov 4 18:56:24 poulenc kernel: Call Trace: Nov 4 18:56:24 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:24 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:24 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:24 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:24 poulenc kernel: rpciod/7 S 0000000000478ce0 0 3042 2 Nov 4 18:56:24 poulenc kernel: Call Trace: Nov 4 18:56:24 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:24 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:24 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:24 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:24 poulenc kernel: rpciod/8 S 0000000000478ce0 0 3043 2 Nov 4 18:56:24 poulenc kernel: Call Trace: Nov 4 18:56:24 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:24 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:24 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:24 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:24 poulenc kernel: rpciod/9 S 0000000000478ce0 0 3044 2 Nov 4 18:56:24 poulenc kernel: Call Trace: Nov 4 18:56:24 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:24 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:24 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:24 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:24 poulenc kernel: rpciod/10 S 0000000000478ce0 0 3045 2 Nov 4 18:56:24 poulenc kernel: Call Trace: Nov 4 18:56:24 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:24 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:24 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:24 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:24 poulenc kernel: rpciod/11 S 0000000000478ce0 0 3046 2 Nov 4 18:56:24 poulenc kernel: Call Trace: Nov 4 18:56:24 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:24 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:24 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:24 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:24 poulenc kernel: rpciod/12 S 0000000000478ce0 0 3047 2 Nov 4 18:56:24 poulenc kernel: Call Trace: Nov 4 18:56:25 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:25 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:25 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:25 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:25 poulenc kernel: rpciod/13 S 0000000000478ce0 0 3048 2 Nov 4 18:56:25 poulenc kernel: Call Trace: Nov 4 18:56:25 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:25 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:25 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:25 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:25 poulenc kernel: rpciod/14 S 0000000000478ce0 0 3049 2 Nov 4 18:56:25 poulenc kernel: Call Trace: Nov 4 18:56:25 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:25 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:25 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:25 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:25 poulenc kernel: rpciod/15 S 0000000000478ce0 0 3050 2 Nov 4 18:56:25 poulenc kernel: Call Trace: Nov 4 18:56:25 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:25 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:25 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:25 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:25 poulenc kernel: rpciod/16 S 0000000000478ce0 0 3051 2 Nov 4 18:56:25 poulenc kernel: Call Trace: Nov 4 18:56:25 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:25 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:25 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:25 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:25 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:25 poulenc kernel: rpciod/17 S 0000000000478ce0 0 3052 2 Nov 4 18:56:25 poulenc kernel: Call Trace: Nov 4 18:56:25 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:25 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:25 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:25 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:25 poulenc kernel: rpciod/18 S 0000000000478ce0 0 3053 2 Nov 4 18:56:25 poulenc kernel: Call Trace: Nov 4 18:56:25 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:25 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:25 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:25 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:25 poulenc kernel: rpciod/19 S 0000000000478ce0 0 3054 2 Nov 4 18:56:25 poulenc kernel: Call Trace: Nov 4 18:56:25 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:25 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:25 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:25 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:26 poulenc kernel: rpciod/20 S 0000000000478ce0 0 3055 2 Nov 4 18:56:26 poulenc kernel: Call Trace: Nov 4 18:56:26 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:26 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:26 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:26 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:26 poulenc kernel: rpciod/21 S 0000000000478ce0 0 3056 2 Nov 4 18:56:26 poulenc kernel: Call Trace: Nov 4 18:56:26 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:26 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:26 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:26 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:26 poulenc kernel: rpciod/22 S 0000000000478ce0 0 3057 2 Nov 4 18:56:26 poulenc kernel: Call Trace: Nov 4 18:56:26 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:26 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:26 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:26 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:26 poulenc kernel: rpciod/23 S 0000000000478ce0 0 3058 2 Nov 4 18:56:26 poulenc kernel: Call Trace: Nov 4 18:56:26 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:26 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:26 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:26 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:26 poulenc kernel: rpc.idmapd S 00000000004ea8fc 0 3139 1 Nov 4 18:56:26 poulenc kernel: Call Trace: Nov 4 18:56:26 poulenc kernel: [000000000067ec30] schedule_timeout+0x78/0xc0 Nov 4 18:56:26 poulenc kernel: [00000000004ea8fc] sys_epoll_wait+0x144/0x480 Nov 4 18:56:26 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:26 poulenc kernel: [00000000f7f2515c] 0xf7f25164 Nov 4 18:56:26 poulenc kernel: syslogd S 00000000004c7d68 0 3244 1 Nov 4 18:56:26 poulenc kernel: Call Trace: Nov 4 18:56:26 poulenc kernel: [000000000067ec30] schedule_timeout+0x78/0xc0 Nov 4 18:56:26 poulenc kernel: [00000000004c7d68] do_select+0x3d0/0x420 Nov 4 18:56:26 poulenc kernel: [00000000004eccb8] compat_core_sys_select+0x160/0x200 Nov 4 18:56:26 poulenc kernel: [00000000004eeee4] compat_sys_select+0x2c/0x1a0 Nov 4 18:56:26 poulenc kernel: [000000000002a32c] 0x2a334 Nov 4 18:56:26 poulenc kernel: [0000000000014910] 0x14918 Nov 4 18:56:26 poulenc kernel: klogd R running task 0 3254 1 Nov 4 18:56:26 poulenc kernel: named S 00000000004061d4 0 3270 1 Nov 4 18:56:26 poulenc kernel: Call Trace: Nov 4 18:56:26 poulenc kernel: [000000000048b6d0] compat_sys_rt_sigsuspend+0x98/0xe0 Nov 4 18:56:26 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:26 poulenc kernel: [00000000f79bb0b0] 0xf79bb0b8 Nov 4 18:56:26 poulenc kernel: named S 00000000004833bc 0 3271 1 Nov 4 18:56:26 poulenc kernel: Call Trace: Nov 4 18:56:26 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:27 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:27 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:27 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:27 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:27 poulenc kernel: named S 00000000004833bc 0 3272 1 Nov 4 18:56:27 poulenc kernel: Call Trace: Nov 4 18:56:27 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:27 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:27 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:27 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:27 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:27 poulenc kernel: named S 00000000004833bc 0 3273 1 Nov 4 18:56:27 poulenc kernel: Call Trace: Nov 4 18:56:27 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:27 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:27 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:27 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:27 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:27 poulenc kernel: named S 00000000004833bc 0 3274 1 Nov 4 18:56:27 poulenc kernel: Call Trace: Nov 4 18:56:27 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:27 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:27 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:27 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:27 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:27 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:27 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:27 poulenc kernel: named S 00000000004833bc 0 3275 1 Nov 4 18:56:27 poulenc kernel: Call Trace: Nov 4 18:56:27 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:27 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:27 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:27 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:27 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:27 poulenc kernel: named S 00000000004833bc 0 3276 1 Nov 4 18:56:27 poulenc kernel: Call Trace: Nov 4 18:56:27 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:27 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:27 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:27 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:27 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:27 poulenc kernel: named S 00000000004833bc 0 3277 1 Nov 4 18:56:27 poulenc kernel: Call Trace: Nov 4 18:56:27 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:28 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:28 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:28 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:28 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:28 poulenc kernel: named S 00000000004833bc 0 3278 1 Nov 4 18:56:28 poulenc kernel: Call Trace: Nov 4 18:56:28 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:28 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:28 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:28 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:28 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:28 poulenc kernel: named S 00000000004833bc 0 3279 1 Nov 4 18:56:28 poulenc kernel: Call Trace: Nov 4 18:56:28 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:28 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:28 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:28 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:28 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:28 poulenc kernel: named S 00000000004833bc 0 3280 1 Nov 4 18:56:28 poulenc kernel: Call Trace: Nov 4 18:56:28 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:28 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:28 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:28 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:28 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:28 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:28 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:28 poulenc kernel: named S 00000000004833bc 0 3281 1 Nov 4 18:56:28 poulenc kernel: Call Trace: Nov 4 18:56:28 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:28 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:28 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:28 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:28 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:28 poulenc kernel: named S 00000000004833bc 0 3282 1 Nov 4 18:56:28 poulenc kernel: Call Trace: Nov 4 18:56:28 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:28 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:28 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:29 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:29 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:29 poulenc kernel: named S 00000000004833bc 0 3283 1 Nov 4 18:56:29 poulenc kernel: Call Trace: Nov 4 18:56:29 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:29 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:29 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:29 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:29 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:29 poulenc kernel: named S 00000000004833bc 0 3284 1 Nov 4 18:56:29 poulenc kernel: Call Trace: Nov 4 18:56:29 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:29 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:29 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:29 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:29 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:29 poulenc kernel: named S 00000000004833bc 0 3285 1 Nov 4 18:56:29 poulenc kernel: Call Trace: Nov 4 18:56:29 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:29 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:29 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:29 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:29 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:29 poulenc kernel: named S 00000000004833bc 0 3286 1 Nov 4 18:56:29 poulenc kernel: Call Trace: Nov 4 18:56:29 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:29 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:29 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:29 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:29 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:29 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:29 poulenc kernel: named S 00000000004833bc 0 3287 1 Nov 4 18:56:29 poulenc kernel: Call Trace: Nov 4 18:56:29 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:29 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:29 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:29 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:29 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:29 poulenc kernel: named S 00000000004833bc 0 3288 1 Nov 4 18:56:29 poulenc kernel: Call Trace: Nov 4 18:56:29 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:29 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:29 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:29 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:30 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:30 poulenc kernel: named S 00000000004833bc 0 3289 1 Nov 4 18:56:30 poulenc kernel: Call Trace: Nov 4 18:56:30 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:30 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:30 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:30 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:30 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:30 poulenc kernel: named S 00000000004833bc 0 3290 1 Nov 4 18:56:30 poulenc kernel: Call Trace: Nov 4 18:56:30 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:30 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:30 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:30 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:30 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:30 poulenc kernel: named S 00000000004833bc 0 3291 1 Nov 4 18:56:30 poulenc kernel: Call Trace: Nov 4 18:56:30 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:30 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:30 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:30 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:30 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:30 poulenc kernel: named S 00000000004833bc 0 3292 1 Nov 4 18:56:30 poulenc kernel: Call Trace: Nov 4 18:56:30 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:30 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:30 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:30 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:30 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:30 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:30 poulenc kernel: named S 00000000004833bc 0 3293 1 Nov 4 18:56:30 poulenc kernel: Call Trace: Nov 4 18:56:30 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:30 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:30 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:30 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:30 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:30 poulenc kernel: named S 00000000004833bc 0 3294 1 Nov 4 18:56:30 poulenc kernel: Call Trace: Nov 4 18:56:30 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:30 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:31 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:31 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:31 poulenc kernel: [00000000f7afb2e4] 0xf7afb2ec Nov 4 18:56:31 poulenc kernel: named S 00000000004833bc 0 3295 1 Nov 4 18:56:31 poulenc kernel: Call Trace: Nov 4 18:56:31 poulenc kernel: [0000000000482e1c] futex_wait+0x1c4/0x2c0 Nov 4 18:56:31 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:31 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:31 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:31 poulenc kernel: [00000000f7afb638] 0xf7afb640 Nov 4 18:56:31 poulenc kernel: named S 00000000004c7d68 0 3296 1 Nov 4 18:56:31 poulenc kernel: Call Trace: Nov 4 18:56:31 poulenc kernel: [000000000067ec30] schedule_timeout+0x78/0xc0 Nov 4 18:56:31 poulenc kernel: [00000000004c7d68] do_select+0x3d0/0x420 Nov 4 18:56:31 poulenc kernel: [00000000004eccb8] compat_core_sys_select+0x160/0x200 Nov 4 18:56:31 poulenc kernel: [00000000004eeee4] compat_sys_select+0x2c/0x1a0 Nov 4 18:56:31 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:31 poulenc kernel: [00000000f7a60070] 0xf7a60078 Nov 4 18:56:31 poulenc kernel: rpc.bootparam S 00000000004c776c 0 3518 1 Nov 4 18:56:31 poulenc kernel: Call Trace: Nov 4 18:56:31 poulenc kernel: [000000000067ec0c] schedule_timeout+0x54/0xc0 Nov 4 18:56:31 poulenc kernel: [00000000004c776c] do_sys_poll+0x234/0x400 Nov 4 18:56:31 poulenc kernel: [00000000004c7960] sys_poll+0x28/0x60 Nov 4 18:56:31 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:31 poulenc kernel: [0000000000011bbc] 0x11bc4 Nov 4 18:56:31 poulenc kernel: hddtemp S 00000000004c7d68 0 3576 1 Nov 4 18:56:31 poulenc kernel: Call Trace: Nov 4 18:56:31 poulenc kernel: [000000000067ec30] schedule_timeout+0x78/0xc0 Nov 4 18:56:31 poulenc kernel: [00000000004c7d68] do_select+0x3d0/0x420 Nov 4 18:56:31 poulenc kernel: [00000000004eccb8] compat_core_sys_select+0x160/0x200 Nov 4 18:56:31 poulenc kernel: [00000000004eeee4] compat_sys_select+0x2c/0x1a0 Nov 4 18:56:31 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:31 poulenc kernel: [000000000001223c] 0x12244 Nov 4 18:56:31 poulenc kernel: lockd S 000000001008aff4 0 3681 2 Nov 4 18:56:31 poulenc kernel: Call Trace: Nov 4 18:56:31 poulenc kernel: [000000000067ec30] schedule_timeout+0x78/0xc0 Nov 4 18:56:31 poulenc kernel: [000000001008aff4] svc_recv+0x21c/0x4e0 [sunrpc] Nov 4 18:56:31 poulenc kernel: [00000000100baef8] lockd+0x120/0x300 [lockd] Nov 4 18:56:31 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:32 poulenc kernel: [0000000010087e10] __svc_create_thread+0x118/0x220 [sunrpc] Nov 4 18:56:32 poulenc kernel: nfsd4 S 0000000000478ce0 0 3682 2 Nov 4 18:56:32 poulenc kernel: Call Trace: Nov 4 18:56:32 poulenc kernel: [0000000000474c5c] worker_thread+0xa4/0xe0 Nov 4 18:56:32 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:32 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:32 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:32 poulenc kernel: nfsd S 000000001008aff4 0 3683 2 Nov 4 18:56:32 poulenc kernel: Call Trace: Nov 4 18:56:32 poulenc kernel: [000000000067ec0c] schedule_timeout+0x54/0xc0 Nov 4 18:56:32 poulenc kernel: [000000001008aff4] svc_recv+0x21c/0x4e0 [sunrpc] Nov 4 18:56:32 poulenc kernel: [0000000010150aac] nfsd+0xb4/0x300 [nfsd] Nov 4 18:56:32 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:32 poulenc kernel: [0000000010087e10] __svc_create_thread+0x118/0x220 [sunrpc] Nov 4 18:56:32 poulenc kernel: rpc.mountd S 00000000004c7d68 0 3694 1 Nov 4 18:56:32 poulenc kernel: Call Trace: Nov 4 18:56:32 poulenc kernel: [000000000067ec30] schedule_timeout+0x78/0xc0 Nov 4 18:56:32 poulenc kernel: [00000000004c7d68] do_select+0x3d0/0x420 Nov 4 18:56:32 poulenc kernel: [00000000004eccb8] compat_core_sys_select+0x160/0x200 Nov 4 18:56:32 poulenc kernel: [00000000004eeee4] compat_sys_select+0x2c/0x1a0 Nov 4 18:56:32 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:32 poulenc kernel: [0000000000016708] 0x16710 Nov 4 18:56:32 poulenc kernel: iscsid S 000000000048c0d4 0 3785 1 Nov 4 18:56:32 poulenc kernel: Call Trace: Nov 4 18:56:32 poulenc kernel: [000000000067ec0c] schedule_timeout+0x54/0xc0 Nov 4 18:56:32 poulenc kernel: [000000000048c0d4] compat_sys_nanosleep+0x7c/0xe0 Nov 4 18:56:32 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:32 poulenc kernel: [00000000f7ec168c] 0xf7ec1694 Nov 4 18:56:32 poulenc kernel: iscsid S 00000000004c776c 0 3786 1 Nov 4 18:56:32 poulenc kernel: Call Trace: Nov 4 18:56:32 poulenc kernel: [000000000067ec0c] schedule_timeout+0x54/0xc0 Nov 4 18:56:32 poulenc kernel: [00000000004c776c] do_sys_poll+0x234/0x400 Nov 4 18:56:32 poulenc kernel: [00000000004c7960] sys_poll+0x28/0x60 Nov 4 18:56:32 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:32 poulenc kernel: [0000000000045bf0] 0x45bf8 Nov 4 18:56:32 poulenc kernel: inetd S 00000000004c7d68 0 3802 1 Nov 4 18:56:32 poulenc kernel: Call Trace: Nov 4 18:56:32 poulenc kernel: [000000000067ec30] schedule_timeout+0x78/0xc0 Nov 4 18:56:32 poulenc kernel: [00000000004c7d68] do_select+0x3d0/0x420 Nov 4 18:56:32 poulenc kernel: [00000000004eccb8] compat_core_sys_select+0x160/0x200 Nov 4 18:56:32 poulenc kernel: [00000000004eeee4] compat_sys_select+0x2c/0x1a0 Nov 4 18:56:32 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:32 poulenc kernel: [00000000000155d8] 0x155e0 Nov 4 18:56:32 poulenc kernel: rarpd S 00000000004c776c 0 3809 1 Nov 4 18:56:32 poulenc kernel: Call Trace: Nov 4 18:56:32 poulenc kernel: [000000000067ec0c] schedule_timeout+0x54/0xc0 Nov 4 18:56:33 poulenc kernel: [00000000004c776c] do_sys_poll+0x234/0x400 Nov 4 18:56:33 poulenc kernel: [00000000004c7960] sys_poll+0x28/0x60 Nov 4 18:56:33 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:33 poulenc kernel: [00000000f7da7234] 0xf7da723c Nov 4 18:56:33 poulenc kernel: smartd S 000000000048c0d4 0 3814 1 Nov 4 18:56:33 poulenc kernel: Call Trace: Nov 4 18:56:33 poulenc kernel: [000000000067ec0c] schedule_timeout+0x54/0xc0 Nov 4 18:56:33 poulenc kernel: [000000000048c0d4] compat_sys_nanosleep+0x7c/0xe0 Nov 4 18:56:33 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:33 poulenc kernel: [00000000f7c5d68c] 0xf7c5d694 Nov 4 18:56:33 poulenc kernel: snmpd S 00000000004c7d68 0 3823 1 Nov 4 18:56:33 poulenc kernel: Call Trace: Nov 4 18:56:33 poulenc kernel: [000000000067ec0c] schedule_timeout+0x54/0xc0 Nov 4 18:56:33 poulenc kernel: [00000000004c7d68] do_select+0x3d0/0x420 Nov 4 18:56:33 poulenc kernel: [00000000004eccb8] compat_core_sys_select+0x160/0x200 Nov 4 18:56:33 poulenc kernel: [00000000004eef74] compat_sys_select+0xbc/0x1a0 Nov 4 18:56:33 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:33 poulenc kernel: [0000000000012e64] 0x12e6c Nov 4 18:56:33 poulenc kernel: sendmail-mta S 00000000004c7d68 0 3880 1 Nov 4 18:56:33 poulenc kernel: Call Trace: Nov 4 18:56:33 poulenc kernel: [000000000067ec0c] schedule_timeout+0x54/0xc0 Nov 4 18:56:33 poulenc kernel: [00000000004c7d68] do_select+0x3d0/0x420 Nov 4 18:56:33 poulenc kernel: [00000000004eccb8] compat_core_sys_select+0x160/0x200 Nov 4 18:56:33 poulenc kernel: [00000000004eef74] compat_sys_select+0xbc/0x1a0 Nov 4 18:56:33 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:33 poulenc kernel: [000000007003b1d8] 0x7003b1e0 Nov 4 18:56:33 poulenc kernel: ntpd S 00000000004c7d68 0 3909 1 Nov 4 18:56:33 poulenc kernel: Call Trace: Nov 4 18:56:33 poulenc kernel: [000000000067ec30] schedule_timeout+0x78/0xc0 Nov 4 18:56:33 poulenc kernel: [00000000004c7d68] do_select+0x3d0/0x420 Nov 4 18:56:33 poulenc kernel: [00000000004eccb8] compat_core_sys_select+0x160/0x200 Nov 4 18:56:33 poulenc kernel: [00000000004eeee4] compat_sys_select+0x2c/0x1a0 Nov 4 18:56:33 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:33 poulenc kernel: [000000000001aea8] 0x1aeb0 Nov 4 18:56:33 poulenc kernel: mdadm S 00000000004c7d68 0 3922 1 Nov 4 18:56:33 poulenc kernel: Call Trace: Nov 4 18:56:33 poulenc kernel: [000000000067ec0c] schedule_timeout+0x54/0xc0 Nov 4 18:56:33 poulenc kernel: [00000000004c7d68] do_select+0x3d0/0x420 Nov 4 18:56:33 poulenc kernel: [00000000004eccb8] compat_core_sys_select+0x160/0x200 Nov 4 18:56:33 poulenc kernel: [00000000004eef74] compat_sys_select+0xbc/0x1a0 Nov 4 18:56:33 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:33 poulenc kernel: [0000000000016cc0] 0x16cc8 Nov 4 18:56:33 poulenc kernel: rsync S 00000000004c7d68 0 3943 1 Nov 4 18:56:34 poulenc kernel: Call Trace: Nov 4 18:56:34 poulenc kernel: [000000000067ec30] schedule_timeout+0x78/0xc0 Nov 4 18:56:34 poulenc kernel: [00000000004c7d68] do_select+0x3d0/0x420 Nov 4 18:56:34 poulenc kernel: [00000000004eccb8] compat_core_sys_select+0x160/0x200 Nov 4 18:56:34 poulenc kernel: [00000000004eeee4] compat_sys_select+0x2c/0x1a0 Nov 4 18:56:34 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:34 poulenc kernel: [0000000000035418] 0x35420 Nov 4 18:56:34 poulenc kernel: atd S 000000000048c0d4 0 3959 1 Nov 4 18:56:34 poulenc kernel: Call Trace: Nov 4 18:56:34 poulenc kernel: [000000000067ec0c] schedule_timeout+0x54/0xc0 Nov 4 18:56:34 poulenc kernel: [000000000048c0d4] compat_sys_nanosleep+0x7c/0xe0 Nov 4 18:56:34 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:34 poulenc kernel: [00000000f7e6d68c] 0xf7e6d694 Nov 4 18:56:34 poulenc kernel: cron S 000000000048c0d4 0 3966 1 Nov 4 18:56:34 poulenc kernel: Call Trace: Nov 4 18:56:34 poulenc kernel: [000000000067ec0c] schedule_timeout+0x54/0xc0 Nov 4 18:56:34 poulenc kernel: [000000000048c0d4] compat_sys_nanosleep+0x7c/0xe0 Nov 4 18:56:34 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:34 poulenc kernel: [00000000f7df968c] 0xf7df9694 Nov 4 18:56:34 poulenc kernel: watchdog S 000000000048c0d4 0 3976 1 Nov 4 18:56:34 poulenc kernel: Call Trace: Nov 4 18:56:34 poulenc kernel: [000000000067ec0c] schedule_timeout+0x54/0xc0 Nov 4 18:56:34 poulenc kernel: [000000000048c0d4] compat_sys_nanosleep+0x7c/0xe0 Nov 4 18:56:34 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:34 poulenc kernel: [00000000f7e4568c] 0xf7e45694 Nov 4 18:56:34 poulenc kernel: apache2 S 00000000004c7d68 0 3994 1 Nov 4 18:56:34 poulenc kernel: Call Trace: Nov 4 18:56:34 poulenc kernel: [000000000067ec0c] schedule_timeout+0x54/0xc0 Nov 4 18:56:34 poulenc kernel: [00000000004c7d68] do_select+0x3d0/0x420 Nov 4 18:56:34 poulenc kernel: [00000000004eccb8] compat_core_sys_select+0x160/0x200 Nov 4 18:56:34 poulenc kernel: [00000000004eef74] compat_sys_select+0xbc/0x1a0 Nov 4 18:56:34 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:34 poulenc kernel: [00000000f7be6c74] 0xf7be6c7c Nov 4 18:56:34 poulenc kernel: fail2ban-serv S 00000000004c7d68 0 4011 1 Nov 4 18:56:34 poulenc kernel: Call Trace: Nov 4 18:56:34 poulenc kernel: [000000000067ec0c] schedule_timeout+0x54/0xc0 Nov 4 18:56:34 poulenc kernel: [00000000004c7d68] do_select+0x3d0/0x420 Nov 4 18:56:34 poulenc kernel: [00000000004eccb8] compat_core_sys_select+0x160/0x200 Nov 4 18:56:34 poulenc kernel: [00000000004eef74] compat_sys_select+0xbc/0x1a0 Nov 4 18:56:34 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:34 poulenc kernel: [00000000f7dc0070] 0xf7dc0078 Nov 4 18:56:34 poulenc kernel: fail2ban-serv S 00000000004c776c 0 4012 1 Nov 4 18:56:34 poulenc kernel: Call Trace: Nov 4 18:56:34 poulenc kernel: [000000000067ec0c] schedule_timeout+0x54/0xc0 Nov 4 18:56:34 poulenc kernel: [00000000004c776c] do_sys_poll+0x234/0x400 Nov 4 18:56:34 poulenc kernel: [00000000004c7960] sys_poll+0x28/0x60 Nov 4 18:56:34 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:34 poulenc kernel: [00000000f7dbd0d4] 0xf7dbd0dc Nov 4 18:56:34 poulenc kernel: fail2ban-serv S 00000000004c7d68 0 4038 1 Nov 4 18:56:34 poulenc kernel: Call Trace: Nov 4 18:56:34 poulenc kernel: [000000000067ec0c] schedule_timeout+0x54/0xc0 Nov 4 18:56:34 poulenc kernel: [00000000004c7d68] do_select+0x3d0/0x420 Nov 4 18:56:35 poulenc kernel: [00000000004eccb8] compat_core_sys_select+0x160/0x200 Nov 4 18:56:35 poulenc kernel: [00000000004eef74] compat_sys_select+0xbc/0x1a0 Nov 4 18:56:35 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:35 poulenc kernel: [00000000f7dc0070] 0xf7dc0078 Nov 4 18:56:35 poulenc kernel: fail2ban-serv S 00000000004c7d68 0 4039 1 Nov 4 18:56:35 poulenc kernel: Call Trace: Nov 4 18:56:35 poulenc kernel: [000000000067ec0c] schedule_timeout+0x54/0xc0 Nov 4 18:56:35 poulenc kernel: [00000000004c7d68] do_select+0x3d0/0x420 Nov 4 18:56:35 poulenc kernel: [00000000004eccb8] compat_core_sys_select+0x160/0x200 Nov 4 18:56:35 poulenc kernel: [00000000004eef74] compat_sys_select+0xbc/0x1a0 Nov 4 18:56:35 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:35 poulenc kernel: [00000000f7dc0070] 0xf7dc0078 Nov 4 18:56:35 poulenc kernel: apache2 S 00000000004ea8fc 0 4071 3994 Nov 4 18:56:35 poulenc kernel: Call Trace: Nov 4 18:56:35 poulenc kernel: [000000000067ec30] schedule_timeout+0x78/0xc0 Nov 4 18:56:35 poulenc kernel: [00000000004ea8fc] sys_epoll_wait+0x144/0x480 Nov 4 18:56:35 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:35 poulenc kernel: [00000000f7be2710] 0xf7be2718 Nov 4 18:56:35 poulenc kernel: apache2 S 00000000004061d4 0 4072 3994 Nov 4 18:56:35 poulenc kernel: Call Trace: Nov 4 18:56:35 poulenc kernel: [00000000005333bc] sys_semtimedop+0x564/0x660 Nov 4 18:56:35 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:35 poulenc kernel: [00000000f7bdaea8] 0xf7bdaeb0 Nov 4 18:56:35 poulenc kernel: apache2 S 00000000004061d4 0 4073 3994 Nov 4 18:56:35 poulenc kernel: Call Trace: Nov 4 18:56:35 poulenc kernel: [00000000005333bc] sys_semtimedop+0x564/0x660 Nov 4 18:56:35 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:35 poulenc kernel: [00000000f7bdaea8] 0xf7bdaeb0 Nov 4 18:56:35 poulenc kernel: apache2 S 00000000004061d4 0 4074 3994 Nov 4 18:56:35 poulenc kernel: Call Trace: Nov 4 18:56:35 poulenc kernel: [00000000005333bc] sys_semtimedop+0x564/0x660 Nov 4 18:56:35 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:35 poulenc kernel: [00000000f7bdaea8] 0xf7bdaeb0 Nov 4 18:56:35 poulenc kernel: apache2 S 00000000004061d4 0 4075 3994 Nov 4 18:56:35 poulenc kernel: Call Trace: Nov 4 18:56:35 poulenc kernel: [00000000005333bc] sys_semtimedop+0x564/0x660 Nov 4 18:56:35 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:35 poulenc kernel: [00000000f7bdaea8] 0xf7bdaeb0 Nov 4 18:56:35 poulenc kernel: login S 000000000048bda8 0 4146 1 Nov 4 18:56:35 poulenc kernel: Call Trace: Nov 4 18:56:35 poulenc kernel: [00000000004656dc] do_wait+0x264/0xda0 Nov 4 18:56:36 poulenc kernel: [000000000048bda8] compat_sys_wait4+0xb0/0xc0 Nov 4 18:56:36 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:36 poulenc kernel: [00000000f7df3234] 0xf7df323c Nov 4 18:56:36 poulenc kernel: bash R running task 0 4173 4146 Nov 4 18:56:36 poulenc kernel: sshd S 00000000004c7d68 0 4330 1 Nov 4 18:56:36 poulenc kernel: Call Trace: Nov 4 18:56:36 poulenc kernel: [000000000067ec30] schedule_timeout+0x78/0xc0 Nov 4 18:56:36 poulenc kernel: [00000000004c7d68] do_select+0x3d0/0x420 Nov 4 18:56:36 poulenc kernel: [00000000004eccb8] compat_core_sys_select+0x160/0x200 Nov 4 18:56:36 poulenc kernel: [00000000004eeee4] compat_sys_select+0x2c/0x1a0 Nov 4 18:56:36 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:36 poulenc kernel: [0000000070017e1c] 0x70017e24 Nov 4 18:56:36 poulenc kernel: md_d0_raid5 R running task 0 4462 2 Nov 4 18:56:36 poulenc kernel: ietd S 000000000067dac4 0 8227 1 Nov 4 18:56:36 poulenc kernel: Call Trace: Nov 4 18:56:36 poulenc kernel: [000000000067d9dc] __down_interruptible+0xa4/0x1c0 Nov 4 18:56:36 poulenc kernel: [000000000067dac4] __down_interruptible+0x18c/0x1c0 Nov 4 18:56:36 poulenc kernel: [00000000102204d0] ioctl+0x58/0x5e0 [iscsi_trgt] Nov 4 18:56:36 poulenc kernel: [00000000004f0484] compat_sys_ioctl+0x14c/0x460 Nov 4 18:56:36 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:36 poulenc kernel: [000000000001532c] 0x15334 Nov 4 18:56:36 poulenc kernel: istd1 R running task 0 8228 2 Nov 4 18:56:36 poulenc kernel: istiod1 D 00000000102262c8 0 8229 2 Nov 4 18:56:36 poulenc kernel: Call Trace: Nov 4 18:56:36 poulenc kernel: [000000000067e4cc] wait_for_completion+0x74/0xe0 Nov 4 18:56:36 poulenc kernel: [00000000102262c8] blockio_make_request+0x1d0/0x24c [iscsi_trgt] Nov 4 18:56:36 poulenc kernel: [000000001021a140] tio_write+0x28/0x80 [iscsi_trgt] Nov 4 18:56:36 poulenc kernel: [0000000010223df8] build_write_response+0x40/0xe0 [iscsi_trgt] Nov 4 18:56:36 poulenc kernel: [000000001021e444] send_scsi_rsp+0xc/0x120 [iscsi_trgt] Nov 4 18:56:36 poulenc kernel: [0000000010223c30] disk_execute_cmnd+0x158/0x220 [iscsi_trgt] Nov 4 18:56:36 poulenc kernel: [0000000010220330] worker_thread+0x118/0x1a0 [iscsi_trgt] Nov 4 18:56:36 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:36 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:36 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:36 poulenc kernel: istiod1 D 00000000102262c8 0 8230 2 Nov 4 18:56:36 poulenc kernel: Call Trace: Nov 4 18:56:36 poulenc kernel: [000000000067e4cc] wait_for_completion+0x74/0xe0 Nov 4 18:56:36 poulenc kernel: [00000000102262c8] blockio_make_request+0x1d0/0x24c [iscsi_trgt] Nov 4 18:56:36 poulenc kernel: [000000001021a140] tio_write+0x28/0x80 [iscsi_trgt] Nov 4 18:56:36 poulenc kernel: [0000000010223df8] build_write_response+0x40/0xe0 [iscsi_trgt] Nov 4 18:56:36 poulenc kernel: [000000001021e444] send_scsi_rsp+0xc/0x120 [iscsi_trgt] Nov 4 18:56:36 poulenc kernel: [0000000010223c30] disk_execute_cmnd+0x158/0x220 [iscsi_trgt] Nov 4 18:56:36 poulenc kernel: [0000000010220330] worker_thread+0x118/0x1a0 [iscsi_trgt] Nov 4 18:56:37 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:37 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:37 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:37 poulenc kernel: istiod1 D 00000000102262c8 0 8231 2 Nov 4 18:56:37 poulenc kernel: Call Trace: Nov 4 18:56:37 poulenc kernel: [000000000067e4cc] wait_for_completion+0x74/0xe0 Nov 4 18:56:37 poulenc kernel: [00000000102262c8] blockio_make_request+0x1d0/0x24c [iscsi_trgt] Nov 4 18:56:37 poulenc kernel: [000000001021a140] tio_write+0x28/0x80 [iscsi_trgt] Nov 4 18:56:37 poulenc kernel: [0000000010223df8] build_write_response+0x40/0xe0 [iscsi_trgt] Nov 4 18:56:37 poulenc kernel: [000000001021e444] send_scsi_rsp+0xc/0x120 [iscsi_trgt] Nov 4 18:56:37 poulenc kernel: [0000000010223c30] disk_execute_cmnd+0x158/0x220 [iscsi_trgt] Nov 4 18:56:37 poulenc kernel: [0000000010220330] worker_thread+0x118/0x1a0 [iscsi_trgt] Nov 4 18:56:37 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:37 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:37 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:37 poulenc kernel: istiod1 D 00000000102262c8 0 8232 2 Nov 4 18:56:37 poulenc kernel: Call Trace: Nov 4 18:56:37 poulenc kernel: [000000000067e4cc] wait_for_completion+0x74/0xe0 Nov 4 18:56:37 poulenc kernel: [00000000102262c8] blockio_make_request+0x1d0/0x24c [iscsi_trgt] Nov 4 18:56:37 poulenc kernel: [000000001021a140] tio_write+0x28/0x80 [iscsi_trgt] Nov 4 18:56:37 poulenc kernel: [0000000010223df8] build_write_response+0x40/0xe0 [iscsi_trgt] Nov 4 18:56:37 poulenc kernel: [000000001021e444] send_scsi_rsp+0xc/0x120 [iscsi_trgt] Nov 4 18:56:37 poulenc kernel: [0000000010223c30] disk_execute_cmnd+0x158/0x220 [iscsi_trgt] Nov 4 18:56:37 poulenc kernel: [0000000010220330] worker_thread+0x118/0x1a0 [iscsi_trgt] Nov 4 18:56:37 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:37 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:37 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:37 poulenc kernel: istiod1 D 00000000102262c8 0 8233 2 Nov 4 18:56:37 poulenc kernel: Call Trace: Nov 4 18:56:37 poulenc kernel: [000000000067e4cc] wait_for_completion+0x74/0xe0 Nov 4 18:56:37 poulenc kernel: [00000000102262c8] blockio_make_request+0x1d0/0x24c [iscsi_trgt] Nov 4 18:56:37 poulenc kernel: [000000001021a140] tio_write+0x28/0x80 [iscsi_trgt] Nov 4 18:56:37 poulenc kernel: [0000000010223df8] build_write_response+0x40/0xe0 [iscsi_trgt] Nov 4 18:56:37 poulenc kernel: [000000001021e444] send_scsi_rsp+0xc/0x120 [iscsi_trgt] Nov 4 18:56:37 poulenc kernel: [0000000010223c30] disk_execute_cmnd+0x158/0x220 [iscsi_trgt] Nov 4 18:56:37 poulenc kernel: [0000000010220330] worker_thread+0x118/0x1a0 [iscsi_trgt] Nov 4 18:56:37 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:37 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:37 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:37 poulenc kernel: istiod1 D 00000000102262c8 0 8234 2 Nov 4 18:56:37 poulenc kernel: Call Trace: Nov 4 18:56:37 poulenc kernel: [000000000067e4cc] wait_for_completion+0x74/0xe0 Nov 4 18:56:37 poulenc kernel: [00000000102262c8] blockio_make_request+0x1d0/0x24c [iscsi_trgt] Nov 4 18:56:37 poulenc kernel: [000000001021a140] tio_write+0x28/0x80 [iscsi_trgt] Nov 4 18:56:37 poulenc kernel: [0000000010223df8] build_write_response+0x40/0xe0 [iscsi_trgt] Nov 4 18:56:37 poulenc kernel: [000000001021e444] send_scsi_rsp+0xc/0x120 [iscsi_trgt] Nov 4 18:56:37 poulenc kernel: [0000000010223c30] disk_execute_cmnd+0x158/0x220 [iscsi_trgt] Nov 4 18:56:37 poulenc kernel: [0000000010220330] worker_thread+0x118/0x1a0 [iscsi_trgt] Nov 4 18:56:38 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:38 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:38 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:38 poulenc kernel: istiod1 D 00000000102262c8 0 8235 2 Nov 4 18:56:38 poulenc kernel: Call Trace: Nov 4 18:56:38 poulenc kernel: [000000000067e4cc] wait_for_completion+0x74/0xe0 Nov 4 18:56:38 poulenc kernel: [00000000102262c8] blockio_make_request+0x1d0/0x24c [iscsi_trgt] Nov 4 18:56:38 poulenc kernel: [000000001021a140] tio_write+0x28/0x80 [iscsi_trgt] Nov 4 18:56:38 poulenc kernel: [0000000010223df8] build_write_response+0x40/0xe0 [iscsi_trgt] Nov 4 18:56:38 poulenc kernel: [000000001021e444] send_scsi_rsp+0xc/0x120 [iscsi_trgt] Nov 4 18:56:38 poulenc kernel: [0000000010223c30] disk_execute_cmnd+0x158/0x220 [iscsi_trgt] Nov 4 18:56:38 poulenc kernel: [0000000010220330] worker_thread+0x118/0x1a0 [iscsi_trgt] Nov 4 18:56:38 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:38 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:38 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:38 poulenc kernel: istiod1 D 00000000102262c8 0 8236 2 Nov 4 18:56:38 poulenc kernel: Call Trace: Nov 4 18:56:38 poulenc kernel: [000000000067e4cc] wait_for_completion+0x74/0xe0 Nov 4 18:56:38 poulenc kernel: [00000000102262c8] blockio_make_request+0x1d0/0x24c [iscsi_trgt] Nov 4 18:56:38 poulenc kernel: [000000001021a140] tio_write+0x28/0x80 [iscsi_trgt] Nov 4 18:56:38 poulenc kernel: [0000000010223df8] build_write_response+0x40/0xe0 [iscsi_trgt] Nov 4 18:56:38 poulenc kernel: [000000001021e444] send_scsi_rsp+0xc/0x120 [iscsi_trgt] Nov 4 18:56:38 poulenc kernel: [0000000010223c30] disk_execute_cmnd+0x158/0x220 [iscsi_trgt] Nov 4 18:56:38 poulenc kernel: [0000000010220330] worker_thread+0x118/0x1a0 [iscsi_trgt] Nov 4 18:56:38 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:38 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:38 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:38 poulenc kernel: identd S 00000000004c776c 0 8394 3802 Nov 4 18:56:38 poulenc kernel: Call Trace: Nov 4 18:56:38 poulenc kernel: [000000000067ec0c] schedule_timeout+0x54/0xc0 Nov 4 18:56:38 poulenc kernel: [00000000004c776c] do_sys_poll+0x234/0x400 Nov 4 18:56:38 poulenc kernel: [00000000004c7960] sys_poll+0x28/0x60 Nov 4 18:56:38 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:38 poulenc kernel: [00000000f7ce50d4] 0xf7ce50dc Nov 4 18:56:38 poulenc kernel: identd S 00000000004833bc 0 8395 3802 Nov 4 18:56:38 poulenc kernel: Call Trace: Nov 4 18:56:38 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:38 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:38 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:38 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:38 poulenc kernel: [00000000f7ee72e4] 0xf7ee72ec Nov 4 18:56:38 poulenc kernel: identd S 00000000004833bc 0 8396 3802 Nov 4 18:56:38 poulenc kernel: Call Trace: Nov 4 18:56:38 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:38 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:38 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:38 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:38 poulenc kernel: [00000000f7ee72e4] 0xf7ee72ec Nov 4 18:56:38 poulenc kernel: identd S 00000000004833bc 0 8397 3802 Nov 4 18:56:38 poulenc kernel: Call Trace: Nov 4 18:56:39 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:39 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:39 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:39 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:39 poulenc kernel: [00000000f7ee72e4] 0xf7ee72ec Nov 4 18:56:39 poulenc kernel: identd S 00000000004833bc 0 8398 3802 Nov 4 18:56:39 poulenc kernel: Call Trace: Nov 4 18:56:39 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:39 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:39 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:39 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:39 poulenc kernel: [00000000f7ee72e4] 0xf7ee72ec Nov 4 18:56:39 poulenc kernel: identd S 00000000004833bc 0 8399 3802 Nov 4 18:56:39 poulenc kernel: Call Trace: Nov 4 18:56:39 poulenc kernel: [0000000000482ec0] futex_wait+0x268/0x2c0 Nov 4 18:56:39 poulenc kernel: [00000000004833bc] do_futex+0x64/0xbc0 Nov 4 18:56:39 poulenc kernel: [00000000004843fc] compat_sys_futex+0x64/0x120 Nov 4 18:56:39 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:39 poulenc kernel: [00000000f7ee72e4] 0xf7ee72ec Nov 4 18:56:39 poulenc kernel: watchdog ? 0000000000466d4c 0 8412 3976 Nov 4 18:56:39 poulenc kernel: Call Trace: Nov 4 18:56:39 poulenc kernel: [00000000004669f0] do_exit+0x6d8/0xa00 Nov 4 18:56:39 poulenc kernel: [0000000000466d4c] do_group_exit+0x34/0xa0 Nov 4 18:56:39 poulenc kernel: [00000000004061d4] linux_sparc_syscall32+0x3c/0x40 Nov 4 18:56:39 poulenc kernel: [000000000001b264] 0x1b26c Nov 4 18:56:46 poulenc kernel: SysRq : Show Blocked State Nov 4 18:56:46 poulenc kernel: task PC stack pid father Nov 4 18:56:46 poulenc kernel: istiod1 D 00000000102262c8 0 8229 2 Nov 4 18:56:46 poulenc kernel: Call Trace: Nov 4 18:56:46 poulenc kernel: [000000000067e4cc] wait_for_completion+0x74/0xe0 Nov 4 18:56:46 poulenc kernel: [00000000102262c8] blockio_make_request+0x1d0/0x24c [iscsi_trgt] Nov 4 18:56:46 poulenc kernel: [000000001021a140] tio_write+0x28/0x80 [iscsi_trgt] Nov 4 18:56:46 poulenc kernel: [0000000010223df8] build_write_response+0x40/0xe0 [iscsi_trgt] Nov 4 18:56:47 poulenc kernel: [000000001021e444] send_scsi_rsp+0xc/0x120 [iscsi_trgt] Nov 4 18:56:47 poulenc kernel: [0000000010223c30] disk_execute_cmnd+0x158/0x220 [iscsi_trgt] Nov 4 18:56:47 poulenc kernel: [0000000010220330] worker_thread+0x118/0x1a0 [iscsi_trgt] Nov 4 18:56:47 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:47 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:47 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:47 poulenc kernel: istiod1 D 00000000102262c8 0 8230 2 Nov 4 18:56:47 poulenc kernel: Call Trace: Nov 4 18:56:47 poulenc kernel: [000000000067e4cc] wait_for_completion+0x74/0xe0 Nov 4 18:56:47 poulenc kernel: [00000000102262c8] blockio_make_request+0x1d0/0x24c [iscsi_trgt] Nov 4 18:56:47 poulenc kernel: [000000001021a140] tio_write+0x28/0x80 [iscsi_trgt] Nov 4 18:56:47 poulenc kernel: [0000000010223df8] build_write_response+0x40/0xe0 [iscsi_trgt] Nov 4 18:56:47 poulenc kernel: [000000001021e444] send_scsi_rsp+0xc/0x120 [iscsi_trgt] Nov 4 18:56:47 poulenc kernel: [0000000010223c30] disk_execute_cmnd+0x158/0x220 [iscsi_trgt] Nov 4 18:56:47 poulenc kernel: [0000000010220330] worker_thread+0x118/0x1a0 [iscsi_trgt] Nov 4 18:56:47 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:47 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:47 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:47 poulenc kernel: istiod1 D 00000000102262c8 0 8231 2 Nov 4 18:56:47 poulenc kernel: Call Trace: Nov 4 18:56:47 poulenc kernel: [000000000067e4cc] wait_for_completion+0x74/0xe0 Nov 4 18:56:47 poulenc kernel: [00000000102262c8] blockio_make_request+0x1d0/0x24c [iscsi_trgt] Nov 4 18:56:47 poulenc kernel: [000000001021a140] tio_write+0x28/0x80 [iscsi_trgt] Nov 4 18:56:47 poulenc kernel: [0000000010223df8] build_write_response+0x40/0xe0 [iscsi_trgt] Nov 4 18:56:47 poulenc kernel: [000000001021e444] send_scsi_rsp+0xc/0x120 [iscsi_trgt] Nov 4 18:56:47 poulenc kernel: [0000000010223c30] disk_execute_cmnd+0x158/0x220 [iscsi_trgt] Nov 4 18:56:47 poulenc kernel: [0000000010220330] worker_thread+0x118/0x1a0 [iscsi_trgt] Nov 4 18:56:47 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:47 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:47 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:47 poulenc kernel: istiod1 D 00000000102262c8 0 8232 2 Nov 4 18:56:47 poulenc kernel: Call Trace: Nov 4 18:56:47 poulenc kernel: [000000000067e4cc] wait_for_completion+0x74/0xe0 Nov 4 18:56:47 poulenc kernel: [00000000102262c8] blockio_make_request+0x1d0/0x24c [iscsi_trgt] Nov 4 18:56:47 poulenc kernel: [000000001021a140] tio_write+0x28/0x80 [iscsi_trgt] Nov 4 18:56:47 poulenc kernel: [0000000010223df8] build_write_response+0x40/0xe0 [iscsi_trgt] Nov 4 18:56:47 poulenc kernel: [000000001021e444] send_scsi_rsp+0xc/0x120 [iscsi_trgt] Nov 4 18:56:48 poulenc kernel: [0000000010223c30] disk_execute_cmnd+0x158/0x220 [iscsi_trgt] Nov 4 18:56:48 poulenc kernel: [0000000010220330] worker_thread+0x118/0x1a0 [iscsi_trgt] Nov 4 18:56:48 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:48 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:48 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:48 poulenc kernel: istiod1 D 00000000102262c8 0 8233 2 Nov 4 18:56:48 poulenc kernel: Call Trace: Nov 4 18:56:48 poulenc kernel: [000000000067e4cc] wait_for_completion+0x74/0xe0 Nov 4 18:56:48 poulenc kernel: [00000000102262c8] blockio_make_request+0x1d0/0x24c [iscsi_trgt] Nov 4 18:56:48 poulenc kernel: [000000001021a140] tio_write+0x28/0x80 [iscsi_trgt] Nov 4 18:56:48 poulenc kernel: [0000000010223df8] build_write_response+0x40/0xe0 [iscsi_trgt] Nov 4 18:56:48 poulenc kernel: [000000001021e444] send_scsi_rsp+0xc/0x120 [iscsi_trgt] Nov 4 18:56:48 poulenc kernel: [0000000010223c30] disk_execute_cmnd+0x158/0x220 [iscsi_trgt] Nov 4 18:56:48 poulenc kernel: [0000000010220330] worker_thread+0x118/0x1a0 [iscsi_trgt] Nov 4 18:56:48 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:48 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:48 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:48 poulenc kernel: istiod1 D 00000000102262c8 0 8234 2 Nov 4 18:56:48 poulenc kernel: Call Trace: Nov 4 18:56:48 poulenc kernel: [000000000067e4cc] wait_for_completion+0x74/0xe0 Nov 4 18:56:48 poulenc kernel: [00000000102262c8] blockio_make_request+0x1d0/0x24c [iscsi_trgt] Nov 4 18:56:48 poulenc kernel: [000000001021a140] tio_write+0x28/0x80 [iscsi_trgt] Nov 4 18:56:48 poulenc kernel: [0000000010223df8] build_write_response+0x40/0xe0 [iscsi_trgt] Nov 4 18:56:48 poulenc kernel: [000000001021e444] send_scsi_rsp+0xc/0x120 [iscsi_trgt] Nov 4 18:56:48 poulenc kernel: [0000000010223c30] disk_execute_cmnd+0x158/0x220 [iscsi_trgt] Nov 4 18:56:48 poulenc kernel: [0000000010220330] worker_thread+0x118/0x1a0 [iscsi_trgt] Nov 4 18:56:48 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:48 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:48 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:48 poulenc kernel: istiod1 D 00000000102262c8 0 8235 2 Nov 4 18:56:48 poulenc kernel: Call Trace: Nov 4 18:56:48 poulenc kernel: [000000000067e4cc] wait_for_completion+0x74/0xe0 Nov 4 18:56:48 poulenc kernel: [00000000102262c8] blockio_make_request+0x1d0/0x24c [iscsi_trgt] Nov 4 18:56:48 poulenc kernel: [000000001021a140] tio_write+0x28/0x80 [iscsi_trgt] Nov 4 18:56:48 poulenc kernel: [0000000010223df8] build_write_response+0x40/0xe0 [iscsi_trgt] Nov 4 18:56:48 poulenc kernel: [000000001021e444] send_scsi_rsp+0xc/0x120 [iscsi_trgt] Nov 4 18:56:48 poulenc kernel: [0000000010223c30] disk_execute_cmnd+0x158/0x220 [iscsi_trgt] Nov 4 18:56:48 poulenc kernel: [0000000010220330] worker_thread+0x118/0x1a0 [iscsi_trgt] Nov 4 18:56:48 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:48 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:48 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 Nov 4 18:56:49 poulenc kernel: istiod1 D 00000000102262c8 0 8236 2 Nov 4 18:56:49 poulenc kernel: Call Trace: Nov 4 18:56:49 poulenc kernel: [000000000067e4cc] wait_for_completion+0x74/0xe0 Nov 4 18:56:49 poulenc kernel: [00000000102262c8] blockio_make_request+0x1d0/0x24c [iscsi_trgt] Nov 4 18:56:49 poulenc kernel: [000000001021a140] tio_write+0x28/0x80 [iscsi_trgt] Nov 4 18:56:49 poulenc kernel: [0000000010223df8] build_write_response+0x40/0xe0 [iscsi_trgt] Nov 4 18:56:49 poulenc kernel: [000000001021e444] send_scsi_rsp+0xc/0x120 [iscsi_trgt] Nov 4 18:56:49 poulenc kernel: [0000000010223c30] disk_execute_cmnd+0x158/0x220 [iscsi_trgt] Nov 4 18:56:49 poulenc kernel: [0000000010220330] worker_thread+0x118/0x1a0 [iscsi_trgt] Nov 4 18:56:49 poulenc kernel: [0000000000478ce0] kthread+0x48/0x80 Nov 4 18:56:49 poulenc kernel: [00000000004273d0] kernel_thread+0x38/0x60 Nov 4 18:56:49 poulenc kernel: [0000000000478f80] kthreadd+0x148/0x1c0 From owner-xfs@oss.sgi.com Sun Nov 4 14:01:50 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 04 Nov 2007 14:01:58 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-0.2 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_33 autolearn=no version=3.3.0-r574664 Received: from mail.ukfsn.org (s2.ukfsn.org [217.158.120.143]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA4M1nsb022151 for ; Sun, 4 Nov 2007 14:01:50 -0800 Received: from localhost (localhost [127.0.0.1]) by mail.ukfsn.org (Postfix) with ESMTP id 25A19DECC0; Sun, 4 Nov 2007 21:44:59 +0000 (GMT) Received: from mail.ukfsn.org ([127.0.0.1]) by localhost (smtp-filter.ukfsn.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PycALoFNZBT8; Sun, 4 Nov 2007 21:44:58 +0000 (GMT) Received: from elm.dgreaves.com (i-83-67-36-194.freedom2surf.net [83.67.36.194]) by mail.ukfsn.org (Postfix) with ESMTP id D4000DED3E; Sun, 4 Nov 2007 21:44:09 +0000 (GMT) Received: from ash.dgreaves.com ([10.0.0.90]) by elm.dgreaves.com with esmtp (Exim 4.62) (envelope-from ) id 1IonCp-0002DL-4r; Sun, 04 Nov 2007 21:40:31 +0000 Message-ID: <472E3C4B.5010904@dgreaves.com> Date: Sun, 04 Nov 2007 21:40:27 +0000 From: David Greaves User-Agent: Mozilla-Thunderbird 2.0.0.6 (X11/20071009) MIME-Version: 1.0 To: Michael Tokarev CC: Justin Piszcz , linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, xfs@oss.sgi.com Subject: Re: 2.6.23.1: mdadm/raid5 hung/d-state References: <472DBF8C.2060508@msgid.tls.msk.ru> <472DDD78.7040002@msgid.tls.msk.ru> In-Reply-To: <472DDD78.7040002@msgid.tls.msk.ru> X-Enigmail-Version: 0.95.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4672/Sun Nov 4 03:38:42 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13547 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: david@dgreaves.com Precedence: bulk X-list: xfs Michael Tokarev wrote: > Justin Piszcz wrote: >> On Sun, 4 Nov 2007, Michael Tokarev wrote: > [] >>> The next time you come across something like that, do a SysRq-T dump and >>> post that. It shows a stack trace of all processes - and in particular, >>> where exactly each task is stuck. > >> Yes I got it before I rebooted, ran that and then dmesg > file. >> >> Here it is: >> >> [1172609.665902] ffffffff80747dc0 ffffffff80747dc0 ffffffff80747dc0 ffffffff80744d80 >> [1172609.668768] ffffffff80747dc0 ffff81015c3aa918 ffff810091c899b4 ffff810091c899a8 > > That's only partial list. All the kernel threads - which are most important > in this context - aren't shown. You ran out of dmesg buffer, and the most > interesting entries was at the beginning. If your /var/log partition is > working, the stuff should be in /var/log/kern.log or equivalent. If it's > not working, there is a way to capture the info still, by stopping syslogd, > cat'ing /proc/kmsg to some tmpfs file and scp'ing it elsewhere. or netconsole is actually pretty easy and incredibly useful in this kind of situation even if there's no disk at all :) David From owner-xfs@oss.sgi.com Sun Nov 4 16:01:14 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 04 Nov 2007 16:01:21 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_50,DATE_IN_PAST_12_24, MIME_QP_LONG_LINE,RCVD_IN_DNSWL_MED,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from b.mx.filmlight.ltd.uk (bongo.filmlight.ltd.uk [217.40.27.26]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA501Cbf001302 for ; Sun, 4 Nov 2007 16:01:13 -0800 Received: (dqd 14871 invoked from network); 5 Nov 2007 00:01:16 -0000 Received: from unknown (HELO BODDINGTON) (roger@62.49.60.134) by b.mx.filmlight.ltd.uk with SMTP; 5 Nov 2007 00:01:16 -0000 Message-ID: <000001c81f3e$eff344b0$6501a8c0@BODDINGTON> From: "Roger Willcocks" To: "Timothy Shimmin" Cc: References: <47249E7A.7060709@filmlight.ltd.uk> <47252F62.6030503@sgi.com> <47262CD0.5010708@filmlight.ltd.uk> <4726ADAE.9070206@sgi.com> <472769A1.5090605@filmlight.ltd.uk> <472A7940.5070800@sgi.com> Subject: Re: bug: truncate to zero + setuid Date: Sun, 4 Nov 2007 11:59:51 -0000 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0645_01C81EDA.353308E0" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.3138 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198 X-Virus-Scanned: ClamAV 0.91.2/4673/Sun Nov 4 14:22:25 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13548 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: roger@filmlight.ltd.uk Precedence: bulk X-list: xfs This is a multi-part message in MIME format. ------=_NextPart_000_0645_01C81EDA.353308E0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit Timothy Shimmin wrote: > Hi Roger, > ... > I don't like all these inconsistencies. Take a look at the attached patch relative to the current cvs (it's a bit big to put inline). The basic problem is it's currently unclear when to set the times from va_atime etc. and when to set them to the current time. So I've used the already defined XFS_AT_UPDxTIME flags to indicate that a time should be set to 'now' and XFS_AT_xTIME to mean set it using va_xtime. This seems to fit well with the current code and I wonder if that's how it was meant to work in the first place. I've also removed the now redundant ATTR_UTIME flag and pulled the null truncate to the top, which simplifies things. One query: in both xfs_iops.c/xfs_vn_setattr and xfs_dm.c/xfs_dm_set_fileattr the ATIME branch sets the inode's atime directly. This is probably something to do with the comment above xfs_iops.c/xfs_ichgtime ('to make sure the access time update will take') but it could probably be handled better. > BTW, your locking looks wrong - it appears you don't unlock when the > file is non-zero size. Oops... -- Roger ------=_NextPart_000_0645_01C81EDA.353308E0 Content-Type: application/octet-stream; name="xfs_setattr.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="xfs_setattr.patch" diff -ur xfs.orig/linux-2.6/xfs_iops.c xfs/linux-2.6/xfs_iops.c --- xfs.orig/linux-2.6/xfs_iops.c 2007-11-04 10:59:00.923480296 +0000 +++ xfs/linux-2.6/xfs_iops.c 2007-11-04 12:43:15.702609336 +0000 @@ -655,13 +655,21 @@ vattr.va_size =3D attr->ia_size; } if (ia_valid & ATTR_ATIME) { - vattr.va_mask |=3D XFS_AT_ATIME; - vattr.va_atime =3D attr->ia_atime; - inode->i_atime =3D attr->ia_atime; + if (ia_valid & ATTR_ATIME_SET) { + vattr.va_mask |=3D XFS_AT_ATIME; + vattr.va_atime =3D attr->ia_atime; + inode->i_atime =3D attr->ia_atime; + } else { + vattr.va_mask |=3D XFS_AT_UPDATIME; + } } if (ia_valid & ATTR_MTIME) { - vattr.va_mask |=3D XFS_AT_MTIME; - vattr.va_mtime =3D attr->ia_mtime; + if (ia_valid & ATTR_MTIME_SET) { + vattr.va_mask |=3D XFS_AT_MTIME; + vattr.va_mtime =3D attr->ia_mtime; + } else { + vattr.va_mask |=3D XFS_AT_UPDMTIME; + } } if (ia_valid & ATTR_CTIME) { vattr.va_mask |=3D XFS_AT_CTIME; @@ -674,8 +682,6 @@ inode->i_mode &=3D ~S_ISGID; } =20 - if (ia_valid & (ATTR_MTIME_SET | ATTR_ATIME_SET)) - flags |=3D ATTR_UTIME; #ifdef ATTR_NO_BLOCK if ((ia_valid & ATTR_NO_BLOCK)) flags |=3D ATTR_NONBLOCK; diff -ur xfs.orig/linux-2.6/xfs_vnode.h xfs/linux-2.6/xfs_vnode.h --- xfs.orig/linux-2.6/xfs_vnode.h 2007-11-04 10:59:00.923480296 +0000 +++ xfs/linux-2.6/xfs_vnode.h 2007-11-04 11:01:33.338309720 +0000 @@ -270,7 +270,6 @@ /* * Flags to vop_setattr/getattr. */ -#define ATTR_UTIME 0x01 /* non-default utime(2) request */ #define ATTR_DMI 0x08 /* invocation from a DMI function */ #define ATTR_LAZY 0x80 /* set/get attributes lazily */ #define ATTR_NONBLOCK 0x100 /* return EAGAIN if operation would block */ diff -ur xfs.orig/xfs_vnodeops.c xfs/xfs_vnodeops.c --- xfs.orig/xfs_vnodeops.c 2007-11-04 10:59:00.917481208 +0000 +++ xfs/xfs_vnodeops.c 2007-11-04 12:07:44.917537904 +0000 @@ -214,10 +214,10 @@ { bhv_vnode_t *vp =3D XFS_ITOV(ip); xfs_mount_t *mp =3D ip->i_mount; - xfs_trans_t *tp; + xfs_trans_t *tp =3D NULL; int mask; int code; - uint lock_flags; + uint lock_flags=3D0; uint commit_flags=3D0; uid_t uid=3D0, iuid=3D0; gid_t gid=3D0, igid=3D0; @@ -244,21 +244,51 @@ if (XFS_FORCED_SHUTDOWN(mp)) return XFS_ERROR(EIO); =20 + olddquot1 =3D olddquot2 =3D NULL; + udqp =3D gdqp =3D NULL; + + /* + * Truncate is special because it changes the file as well as + * the attributes. + */ + if (mask & XFS_AT_SIZE) { + /* Must have write permission and not be a directory. */ + if (VN_ISDIR(vp)) { + code =3D XFS_ERROR(EISDIR); + goto error_return; + } else if (!VN_ISREG(vp)) { + code =3D XFS_ERROR(EINVAL); + goto error_return; + } + /* + * Short circuit the truncate case for zero length files. + */ + if (vap->va_size =3D=3D 0) { + xfs_ilock(ip, XFS_ILOCK_EXCL); + if ((ip->i_size =3D=3D 0) && (ip->i_d.di_nextents =3D=3D 0)) { + mask |=3D XFS_AT_UPDCTIME|XFS_AT_UPDMTIME; + mask &=3D ~XFS_AT_SIZE; + } + xfs_iunlock(ip, XFS_ILOCK_EXCL); + } + } + /* * Timestamps do not need to be logged and hence do not * need to be done within a transaction. */ if (mask & XFS_AT_UPDTIMES) { - ASSERT((mask & ~XFS_AT_UPDTIMES) =3D=3D 0); timeflags =3D ((mask & XFS_AT_UPDATIME) ? XFS_ICHGTIME_ACC : 0) | ((mask & XFS_AT_UPDCTIME) ? XFS_ICHGTIME_CHG : 0) | ((mask & XFS_AT_UPDMTIME) ? XFS_ICHGTIME_MOD : 0); - xfs_ichgtime(ip, timeflags); - return 0; + mask &=3D ~XFS_AT_UPDTIMES; } =20 - olddquot1 =3D olddquot2 =3D NULL; - udqp =3D gdqp =3D NULL; + if (mask =3D=3D 0) { + if (timeflags && !(flags & ATTR_DMI)) + xfs_ichgtime(ip, timeflags); + return 0; + } =20 /* * If disk quotas is on, we make sure that the dquots do exist on disk, @@ -307,12 +337,11 @@ * For the other attributes, we acquire the inode lock and * first do an error checking pass. */ - tp =3D NULL; lock_flags =3D XFS_ILOCK_EXCL; if (flags & ATTR_NOLOCK) need_iolock =3D 0; if (!(mask & XFS_AT_SIZE)) { - if ((mask !=3D (XFS_AT_CTIME|XFS_AT_ATIME|XFS_AT_MTIME)) || + if ((mask & ~(XFS_AT_CTIME|XFS_AT_ATIME|XFS_AT_MTIME)) !=3D 0 || (mp->m_flags & XFS_MOUNT_WSYNC)) { tp =3D xfs_trans_alloc(mp, XFS_TRANS_SETATTR_NOT_SIZE); commit_flags =3D 0; @@ -451,24 +480,6 @@ * Truncate file. Must have write permission and not be a directory. */ if (mask & XFS_AT_SIZE) { - /* Short circuit the truncate case for zero length files */ - if ((vap->va_size =3D=3D 0) && - (ip->i_size =3D=3D 0) && (ip->i_d.di_nextents =3D=3D 0)) { - xfs_iunlock(ip, XFS_ILOCK_EXCL); - lock_flags &=3D ~XFS_ILOCK_EXCL; - if (mask & XFS_AT_CTIME) - xfs_ichgtime(ip, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); - code =3D 0; - goto error_return; - } - - if (VN_ISDIR(vp)) { - code =3D XFS_ERROR(EISDIR); - goto error_return; - } else if (!VN_ISREG(vp)) { - code =3D XFS_ERROR(EINVAL); - goto error_return; - } /* * Make sure that the dquots are attached to the inode. */ @@ -481,8 +492,7 @@ */ if (mask & (XFS_AT_ATIME|XFS_AT_MTIME)) { if (!file_owner) { - if ((flags & ATTR_UTIME) && - !capable(CAP_FOWNER)) { + if (!capable(CAP_FOWNER)) { code =3D XFS_ERROR(EPERM); goto error_return; } @@ -760,7 +770,7 @@ timeflags &=3D ~XFS_ICHGTIME_MOD; timeflags |=3D XFS_ICHGTIME_CHG; } - if (tp && (flags & ATTR_UTIME)) + if (tp) xfs_trans_log_inode (tp, ip, XFS_ILOG_CORE); } =20 ------=_NextPart_000_0645_01C81EDA.353308E0-- From owner-xfs@oss.sgi.com Sun Nov 4 17:45:44 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 04 Nov 2007 17:45:49 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_62, J_CHICKENPOX_63 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA51jb0s013123 for ; Sun, 4 Nov 2007 17:45:41 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA14744; Mon, 5 Nov 2007 12:45:24 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lA51jKdD94077391; Mon, 5 Nov 2007 12:45:21 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lA51jAW894285231; Mon, 5 Nov 2007 12:45:10 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Mon, 5 Nov 2007 12:45:10 +1100 From: David Chinner To: Torsten Kaiser Cc: David Chinner , Peter Zijlstra , Fengguang Wu , Maxim Levitsky , linux-kernel@vger.kernel.org, Andrew Morton , linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com Subject: Re: writeout stalls in current -git Message-ID: <20071105014510.GU66820511@sgi.com> References: <393060478.03650@ustc.edu.cn> <64bb37e0710310822r5ca6b793p8fd97db2f72a8655@mail.gmail.com> <393903856.06449@ustc.edu.cn> <64bb37e0711011120i63cdfe3ci18995d57b6649a8@mail.gmail.com> <64bb37e0711011200n228e708eg255640388f83da22@mail.gmail.com> <1193998532.27652.343.camel@twins> <64bb37e0711021222q7d12c825mc62d433c4fe19e8@mail.gmail.com> <20071102204258.GR995458@sgi.com> <64bb37e0711040319l5de285c3xea64474540a51b6e@mail.gmail.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="7ZAtKRhVyVSsbBD2" Content-Disposition: inline In-Reply-To: <64bb37e0711040319l5de285c3xea64474540a51b6e@mail.gmail.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4673/Sun Nov 4 14:22:25 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13549 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs --7ZAtKRhVyVSsbBD2 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Sun, Nov 04, 2007 at 12:19:19PM +0100, Torsten Kaiser wrote: > On 11/2/07, David Chinner wrote: > > That's stalled waiting on the inode cluster buffer lock. That implies > > that the inode lcuser is already being written out and the inode has > > been redirtied during writeout. > > > > Does the kernel you are testing have the "flush inodes in ascending > > inode number order" patches applied? If so, can you remove that > > patch and see if the problem goes away? > > I can now confirm, that I see this also with the current mainline-git-version > I used 2.6.24-rc1-git-b4f555081fdd27d13e6ff39d455d5aefae9d2c0c > plus the fix for the sg changes in ieee1394. Ok, so it's probably a side effect of the writeback changes. Attached are two patches (two because one was in a separate patchset as a standalone change) that should prevent async writeback from blocking on locked inode cluster buffers. Apply the xfs-factor-inotobp patch first. Can you see if this fixes the problem? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group --7ZAtKRhVyVSsbBD2 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=xfs-factor-inotobp --- fs/xfs/xfs_inode.c | 283 ++++++++++++++++++++++++----------------------------- 1 file changed, 129 insertions(+), 154 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/xfs_inode.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_inode.c 2007-09-12 15:41:22.000000000 +1000 +++ 2.6.x-xfs-new/fs/xfs/xfs_inode.c 2007-09-13 08:57:06.395641940 +1000 @@ -124,6 +124,126 @@ xfs_inobp_check( #endif /* + * Simple wrapper for calling xfs_imap() that includes error + * and bounds checking + */ +STATIC int +xfs_ino_to_imap( + xfs_mount_t *mp, + xfs_trans_t *tp, + xfs_ino_t ino, + xfs_imap_t *imap, + uint imap_flags) +{ + int error; + + error = xfs_imap(mp, tp, ino, imap, imap_flags); + if (error) { + cmn_err(CE_WARN, "xfs_ino_to_imap: xfs_imap() returned an " + "error %d on %s. Returning error.", + error, mp->m_fsname); + return error; + } + + /* + * If the inode number maps to a block outside the bounds + * of the file system then return NULL rather than calling + * read_buf and panicing when we get an error from the + * driver. + */ + if ((imap->im_blkno + imap->im_len) > + XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)) { + xfs_fs_cmn_err(CE_ALERT, mp, "xfs_ino_to_imap: " + "(imap->im_blkno (0x%llx) + imap->im_len (0x%llx)) > " + " XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks) (0x%llx)", + (unsigned long long) imap->im_blkno, + (unsigned long long) imap->im_len, + XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)); + return XFS_ERROR(EINVAL); + } + return 0; +} + +/* + * Find the buffer associated with the given inode map + * We do basic validation checks on the buffer once it has been + * retrieved from disk. + */ +STATIC int +xfs_imap_to_bp( + xfs_mount_t *mp, + xfs_trans_t *tp, + xfs_imap_t *imap, + xfs_buf_t **bpp, + uint buf_flags, + uint imap_flags) +{ + int error; + int i; + int ni; + xfs_buf_t *bp; + + error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp, imap->im_blkno, + (int)imap->im_len, XFS_BUF_LOCK, &bp); + if (error) { + cmn_err(CE_WARN, "xfs_imap_to_bp: xfs_trans_read_buf()returned " + "an error %d on %s. Returning error.", + error, mp->m_fsname); + return error; + } + + /* + * Validate the magic number and version of every inode in the buffer + * (if DEBUG kernel) or the first inode in the buffer, otherwise. + */ +#ifdef DEBUG + ni = BBTOB(imap->im_len) >> mp->m_sb.sb_inodelog; +#else /* usual case */ + ni = 1; +#endif + + for (i = 0; i < ni; i++) { + int di_ok; + xfs_dinode_t *dip; + + dip = (xfs_dinode_t *)xfs_buf_offset(bp, + (i << mp->m_sb.sb_inodelog)); + di_ok = be16_to_cpu(dip->di_core.di_magic) == XFS_DINODE_MAGIC && + XFS_DINODE_GOOD_VERSION(dip->di_core.di_version); + if (unlikely(XFS_TEST_ERROR(!di_ok, mp, + XFS_ERRTAG_ITOBP_INOTOBP, + XFS_RANDOM_ITOBP_INOTOBP))) { + if (imap_flags & XFS_IMAP_BULKSTAT) { + xfs_trans_brelse(tp, bp); + return XFS_ERROR(EINVAL); + } + XFS_CORRUPTION_ERROR("xfs_imap_to_bp", + XFS_ERRLEVEL_HIGH, mp, dip); +#ifdef DEBUG + cmn_err(CE_PANIC, + "Device %s - bad inode magic/vsn " + "daddr %lld #%d (magic=%x)", + XFS_BUFTARG_NAME(mp->m_ddev_targp), + (unsigned long long)imap->im_blkno, i, + be16_to_cpu(dip->di_core.di_magic)); +#endif + xfs_trans_brelse(tp, bp); + return XFS_ERROR(EFSCORRUPTED); + } + } + + xfs_inobp_check(mp, bp); + + /* + * Mark the buffer as an inode buffer now that it looks good + */ + XFS_BUF_SET_VTYPE(bp, B_FS_INO); + + *bpp = bp; + return 0; +} + +/* * This routine is called to map an inode number within a file * system to the buffer containing the on-disk version of the * inode. It returns a pointer to the buffer containing the @@ -145,72 +265,19 @@ xfs_inotobp( xfs_buf_t **bpp, int *offset) { - int di_ok; xfs_imap_t imap; xfs_buf_t *bp; int error; - xfs_dinode_t *dip; - /* - * Call the space management code to find the location of the - * inode on disk. - */ imap.im_blkno = 0; - error = xfs_imap(mp, tp, ino, &imap, XFS_IMAP_LOOKUP); - if (error != 0) { - cmn_err(CE_WARN, - "xfs_inotobp: xfs_imap() returned an " - "error %d on %s. Returning error.", error, mp->m_fsname); + error = xfs_ino_to_imap(mp, tp, ino, &imap, XFS_IMAP_LOOKUP); + if (error) return error; - } - - /* - * If the inode number maps to a block outside the bounds of the - * file system then return NULL rather than calling read_buf - * and panicing when we get an error from the driver. - */ - if ((imap.im_blkno + imap.im_len) > - XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)) { - cmn_err(CE_WARN, - "xfs_inotobp: inode number (%llu + %d) maps to a block outside the bounds " - "of the file system %s. Returning EINVAL.", - (unsigned long long)imap.im_blkno, - imap.im_len, mp->m_fsname); - return XFS_ERROR(EINVAL); - } - - /* - * Read in the buffer. If tp is NULL, xfs_trans_read_buf() will - * default to just a read_buf() call. - */ - error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp, imap.im_blkno, - (int)imap.im_len, XFS_BUF_LOCK, &bp); - if (error) { - cmn_err(CE_WARN, - "xfs_inotobp: xfs_trans_read_buf() returned an " - "error %d on %s. Returning error.", error, mp->m_fsname); + error = xfs_imap_to_bp(mp, tp, &imap, &bp, XFS_BUF_LOCK, 0); + if (error) return error; - } - dip = (xfs_dinode_t *)xfs_buf_offset(bp, 0); - di_ok = - be16_to_cpu(dip->di_core.di_magic) == XFS_DINODE_MAGIC && - XFS_DINODE_GOOD_VERSION(dip->di_core.di_version); - if (unlikely(XFS_TEST_ERROR(!di_ok, mp, XFS_ERRTAG_ITOBP_INOTOBP, - XFS_RANDOM_ITOBP_INOTOBP))) { - XFS_CORRUPTION_ERROR("xfs_inotobp", XFS_ERRLEVEL_LOW, mp, dip); - xfs_trans_brelse(tp, bp); - cmn_err(CE_WARN, - "xfs_inotobp: XFS_TEST_ERROR() returned an " - "error on %s. Returning EFSCORRUPTED.", mp->m_fsname); - return XFS_ERROR(EFSCORRUPTED); - } - - xfs_inobp_check(mp, bp); - /* - * Set *dipp to point to the on-disk inode in the buffer. - */ *dipp = (xfs_dinode_t *)xfs_buf_offset(bp, imap.im_boffset); *bpp = bp; *offset = imap.im_boffset; @@ -251,41 +318,15 @@ xfs_itobp( xfs_imap_t imap; xfs_buf_t *bp; int error; - int i; - int ni; if (ip->i_blkno == (xfs_daddr_t)0) { - /* - * Call the space management code to find the location of the - * inode on disk. - */ imap.im_blkno = bno; - if ((error = xfs_imap(mp, tp, ip->i_ino, &imap, - XFS_IMAP_LOOKUP | imap_flags))) + error = xfs_ino_to_imap(mp, tp, ip->i_ino, &imap, + XFS_IMAP_LOOKUP | imap_flags); + if (error) return error; /* - * If the inode number maps to a block outside the bounds - * of the file system then return NULL rather than calling - * read_buf and panicing when we get an error from the - * driver. - */ - if ((imap.im_blkno + imap.im_len) > - XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)) { -#ifdef DEBUG - xfs_fs_cmn_err(CE_ALERT, mp, "xfs_itobp: " - "(imap.im_blkno (0x%llx) " - "+ imap.im_len (0x%llx)) > " - " XFS_FSB_TO_BB(mp, " - "mp->m_sb.sb_dblocks) (0x%llx)", - (unsigned long long) imap.im_blkno, - (unsigned long long) imap.im_len, - XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)); -#endif /* DEBUG */ - return XFS_ERROR(EINVAL); - } - - /* * Fill in the fields in the inode that will be used to * map the inode to its buffer from now on. */ @@ -303,76 +344,10 @@ xfs_itobp( } ASSERT(bno == 0 || bno == imap.im_blkno); - /* - * Read in the buffer. If tp is NULL, xfs_trans_read_buf() will - * default to just a read_buf() call. - */ - error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp, imap.im_blkno, - (int)imap.im_len, XFS_BUF_LOCK, &bp); - if (error) { -#ifdef DEBUG - xfs_fs_cmn_err(CE_ALERT, mp, "xfs_itobp: " - "xfs_trans_read_buf() returned error %d, " - "imap.im_blkno 0x%llx, imap.im_len 0x%llx", - error, (unsigned long long) imap.im_blkno, - (unsigned long long) imap.im_len); -#endif /* DEBUG */ + error = xfs_imap_to_bp(mp, tp, &imap, &bp, XFS_BUF_LOCK, imap_flags); + if (error) return error; - } - - /* - * Validate the magic number and version of every inode in the buffer - * (if DEBUG kernel) or the first inode in the buffer, otherwise. - * No validation is done here in userspace (xfs_repair). - */ -#if !defined(__KERNEL__) - ni = 0; -#elif defined(DEBUG) - ni = BBTOB(imap.im_len) >> mp->m_sb.sb_inodelog; -#else /* usual case */ - ni = 1; -#endif - - for (i = 0; i < ni; i++) { - int di_ok; - xfs_dinode_t *dip; - - dip = (xfs_dinode_t *)xfs_buf_offset(bp, - (i << mp->m_sb.sb_inodelog)); - di_ok = be16_to_cpu(dip->di_core.di_magic) == XFS_DINODE_MAGIC && - XFS_DINODE_GOOD_VERSION(dip->di_core.di_version); - if (unlikely(XFS_TEST_ERROR(!di_ok, mp, - XFS_ERRTAG_ITOBP_INOTOBP, - XFS_RANDOM_ITOBP_INOTOBP))) { - if (imap_flags & XFS_IMAP_BULKSTAT) { - xfs_trans_brelse(tp, bp); - return XFS_ERROR(EINVAL); - } -#ifdef DEBUG - cmn_err(CE_ALERT, - "Device %s - bad inode magic/vsn " - "daddr %lld #%d (magic=%x)", - XFS_BUFTARG_NAME(mp->m_ddev_targp), - (unsigned long long)imap.im_blkno, i, - be16_to_cpu(dip->di_core.di_magic)); -#endif - XFS_CORRUPTION_ERROR("xfs_itobp", XFS_ERRLEVEL_HIGH, - mp, dip); - xfs_trans_brelse(tp, bp); - return XFS_ERROR(EFSCORRUPTED); - } - } - - xfs_inobp_check(mp, bp); - /* - * Mark the buffer as an inode buffer now that it looks good - */ - XFS_BUF_SET_VTYPE(bp, B_FS_INO); - - /* - * Set *dipp to point to the on-disk inode in the buffer. - */ *dipp = (xfs_dinode_t *)xfs_buf_offset(bp, imap.im_boffset); *bpp = bp; return 0; --7ZAtKRhVyVSsbBD2 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=xfs-iflush-blocking-fix --- fs/xfs/linux-2.6/xfs_super.c | 3 +- fs/xfs/linux-2.6/xfs_vnode.h | 5 --- fs/xfs/xfs_inode.c | 33 ++++++++++++++++--------- fs/xfs/xfs_inode.h | 7 +++-- fs/xfs/xfs_vnodeops.c | 55 +++++++++---------------------------------- 5 files changed, 41 insertions(+), 62 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/xfs_inode.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_inode.c 2007-11-05 10:17:36.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_inode.c 2007-11-05 10:33:49.590268027 +1100 @@ -306,14 +306,15 @@ xfs_inotobp( * 0 for the disk block address. */ int -xfs_itobp( +xfs_itobp_flags( xfs_mount_t *mp, xfs_trans_t *tp, xfs_inode_t *ip, xfs_dinode_t **dipp, xfs_buf_t **bpp, xfs_daddr_t bno, - uint imap_flags) + uint imap_flags, + uint buf_flags) { xfs_imap_t imap; xfs_buf_t *bp; @@ -344,10 +345,17 @@ xfs_itobp( } ASSERT(bno == 0 || bno == imap.im_blkno); - error = xfs_imap_to_bp(mp, tp, &imap, &bp, XFS_BUF_LOCK, imap_flags); + error = xfs_imap_to_bp(mp, tp, &imap, &bp, buf_flags, imap_flags); if (error) return error; + if (!bp) { + ASSERT(buf_flags & XFS_BUF_TRYLOCK); + ASSERT(tp == NULL); + *bpp = NULL; + return EAGAIN; + } + *dipp = (xfs_dinode_t *)xfs_buf_offset(bp, imap.im_boffset); *bpp = bp; return 0; @@ -3068,15 +3076,6 @@ xfs_iflush( } /* - * Get the buffer containing the on-disk inode. - */ - error = xfs_itobp(mp, NULL, ip, &dip, &bp, 0, 0); - if (error) { - xfs_ifunlock(ip); - return error; - } - - /* * Decide how buffer will be flushed out. This is done before * the call to xfs_iflush_int because this field is zeroed by it. */ @@ -3125,6 +3124,16 @@ xfs_iflush( } /* + * Get the buffer containing the on-disk inode. + */ + error = xfs_itobp_flags(mp, NULL, ip, &dip, &bp, 0, 0, + (flags == INT_ASYNC) ? XFS_BUF_TRYLOCK : XFS_BUF_LOCK); + if (error ||!bp) { + xfs_ifunlock(ip); + return error; + } + + /* * First flush out the inode that xfs_iflush was called with. */ error = xfs_iflush_int(ip, bp); Index: 2.6.x-xfs-new/fs/xfs/xfs_inode.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_inode.h 2007-11-02 13:44:46.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_inode.h 2007-11-05 10:25:44.885153248 +1100 @@ -488,9 +488,12 @@ int xfs_finish_reclaim_all(struct xfs_m /* * xfs_inode.c prototypes. */ -int xfs_itobp(struct xfs_mount *, struct xfs_trans *, +int xfs_itobp_flags(struct xfs_mount *, struct xfs_trans *, xfs_inode_t *, struct xfs_dinode **, struct xfs_buf **, - xfs_daddr_t, uint); + xfs_daddr_t, uint, uint); +#define xfs_itobp(mp, tp, ip, dipp, bpp, bno, iflags) \ + xfs_itobp_flags(mp, tp, ip, dipp, bpp, bno, iflags, XFS_BUF_LOCK) + int xfs_iread(struct xfs_mount *, struct xfs_trans *, xfs_ino_t, xfs_inode_t **, xfs_daddr_t, uint); int xfs_iread_extents(struct xfs_trans *, xfs_inode_t *, int); Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_super.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_super.c 2007-11-02 13:44:50.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_super.c 2007-11-05 10:39:05.969204451 +1100 @@ -840,7 +840,8 @@ xfs_fs_write_inode( struct inode *inode, int sync) { - int error = 0, flags = FLUSH_INODE; + int error = 0; + int flags = 0; xfs_itrace_entry(XFS_I(inode)); if (sync) { Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_vnode.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_vnode.h 2007-10-02 16:01:47.000000000 +1000 +++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_vnode.h 2007-11-05 10:40:49.103817818 +1100 @@ -73,12 +73,9 @@ typedef enum bhv_vrwlock { #define IO_INVIS 0x00020 /* don't update inode timestamps */ /* - * Flags for vop_iflush call + * Flags for xfs_inode_flush */ #define FLUSH_SYNC 1 /* wait for flush to complete */ -#define FLUSH_INODE 2 /* flush the inode itself */ -#define FLUSH_LOG 4 /* force the last log entry for - * this inode out to disk */ /* * Flush/Invalidate options for vop_toss/flush/flushinval_pages. Index: 2.6.x-xfs-new/fs/xfs/xfs_vnodeops.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_vnodeops.c 2007-11-05 10:02:05.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_vnodeops.c 2007-11-05 10:37:53.398623943 +1100 @@ -3556,29 +3556,6 @@ xfs_inode_flush( ((iip == NULL) || !(iip->ili_format.ilf_fields & XFS_ILOG_ALL))) return 0; - if (flags & FLUSH_LOG) { - if (iip && iip->ili_last_lsn) { - xlog_t *log = mp->m_log; - xfs_lsn_t sync_lsn; - int s, log_flags = XFS_LOG_FORCE; - - s = GRANT_LOCK(log); - sync_lsn = log->l_last_sync_lsn; - GRANT_UNLOCK(log, s); - - if ((XFS_LSN_CMP(iip->ili_last_lsn, sync_lsn) > 0)) { - if (flags & FLUSH_SYNC) - log_flags |= XFS_LOG_SYNC; - error = xfs_log_force(mp, iip->ili_last_lsn, log_flags); - if (error) - return error; - } - - if (ip->i_update_core == 0) - return 0; - } - } - /* * We make this non-blocking if the inode is contended, * return EAGAIN to indicate to the caller that they @@ -3586,30 +3563,22 @@ xfs_inode_flush( * blocking on inodes inside another operation right * now, they get caught later by xfs_sync. */ - if (flags & FLUSH_INODE) { - int flush_flags; - - if (flags & FLUSH_SYNC) { - xfs_ilock(ip, XFS_ILOCK_SHARED); - xfs_iflock(ip); - } else if (xfs_ilock_nowait(ip, XFS_ILOCK_SHARED)) { - if (xfs_ipincount(ip) || !xfs_iflock_nowait(ip)) { - xfs_iunlock(ip, XFS_ILOCK_SHARED); - return EAGAIN; - } - } else { + if (flags & FLUSH_SYNC) { + xfs_ilock(ip, XFS_ILOCK_SHARED); + xfs_iflock(ip); + } else if (xfs_ilock_nowait(ip, XFS_ILOCK_SHARED)) { + if (xfs_ipincount(ip) || !xfs_iflock_nowait(ip)) { + xfs_iunlock(ip, XFS_ILOCK_SHARED); return EAGAIN; } - - if (flags & FLUSH_SYNC) - flush_flags = XFS_IFLUSH_SYNC; - else - flush_flags = XFS_IFLUSH_ASYNC; - - error = xfs_iflush(ip, flush_flags); - xfs_iunlock(ip, XFS_ILOCK_SHARED); + } else { + return EAGAIN; } + error = xfs_iflush(ip, (flags & FLUSH_SYNC) ? XFS_IFLUSH_SYNC + : XFS_IFLUSH_ASYNC); + xfs_iunlock(ip, XFS_ILOCK_SHARED); + return error; } --7ZAtKRhVyVSsbBD2-- From owner-xfs@oss.sgi.com Sun Nov 4 20:15:49 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 04 Nov 2007 20:16:11 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA54FjhE026281 for ; Sun, 4 Nov 2007 20:15:48 -0800 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA16991; Mon, 5 Nov 2007 15:15:44 +1100 Received: by chook.melbourne.sgi.com (Postfix, from userid 16346) id C86DB58C38F7; Mon, 5 Nov 2007 15:15:44 +1100 (EST) To: sgi.bugs.xfs@engr.sgi.com Cc: xfs@oss.sgi.com Subject: PARTIAL TAKE 971186 - Move platform specific mount option parsing out of core XFS code Message-Id: <20071105041544.C86DB58C38F7@chook.melbourne.sgi.com> Date: Mon, 5 Nov 2007 15:15:44 +1100 (EST) From: dgc@sgi.com (David Chinner) X-Virus-Scanned: ClamAV 0.91.2/4673/Sun Nov 4 14:22:25 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13550 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Move platform specific mount option parsing out of core XFS code Mount option parsing is platform specific. Move it out of core code into the platform specific superblock operation file. Date: Mon Nov 5 15:14:58 AEDT 2007 Workarea: chook.melbourne.sgi.com:/build/dgc/isms/2.6.x-xfs Inspected by: hch@infradead.org The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb Modid: xfs-linux-melb:xfs-kern:30012a fs/xfs/xfs_vfsops.c - 1.548 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_vfsops.c.diff?r1=text&tr1=1.548&r2=text&tr2=1.547&f=h - move linux specific mount option parsing into linux specific code. fs/xfs/linux-2.6/xfs_super.c - 1.403 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_super.c.diff?r1=text&tr1=1.403&r2=text&tr2=1.402&f=h - move linux specific mount option parsing into linux specific code. fs/xfs/xfs_vfsops.h - 1.6 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_vfsops.h.diff?r1=text&tr1=1.6&r2=text&tr2=1.5&f=h - move linux specific mount option parsing into linux specific code. From owner-xfs@oss.sgi.com Sun Nov 4 21:07:12 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 04 Nov 2007 21:07:20 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_62, J_CHICKENPOX_64 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA5574Qj006251 for ; Sun, 4 Nov 2007 21:07:09 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA18323; Mon, 5 Nov 2007 16:07:08 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lA5577dD94343975; Mon, 5 Nov 2007 16:07:08 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lA5576JN94246852; Mon, 5 Nov 2007 16:07:06 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Mon, 5 Nov 2007 16:07:06 +1100 From: David Chinner To: xfs-oss Cc: xfs-dev Subject: [PATCH, RFC] Move AIL pushing into a separate thread Message-ID: <20071105050706.GW66820511@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4673/Sun Nov 4 14:22:25 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13551 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs When many hundreds to thousands of threads all try to do simultaneous transactions and the log is in a tail-pushing situation (i.e. full), we can get multiple threads walking the AIL list and contending on the AIL lock. Recently wevve had two cases of machines basically locking up because most of the CPUs in the system are trying to obtain the AIL lock. The first was an 8p machine with ~2,500 kernel threads trying to do transactions, and the latest is a 2048p altix closing a file per MPI rank in a synchronised fashion resulting in > 400 processes all trying to walk and push the AIL at the same time. The AIL push is, in effect, a simple I/O dispatch algorithm complicated by the ordering constraints placed on it by the transaction subsystem. It really does not need multiple threads to push on it - even when only a single CPU is pushing the AIL, it can push the I/O out far faster that pretty much any disk subsystem can handle. So, to avoid contention problems stemming from multiple list walkers, move the list walk off into another thread and simply provide a "target" to push to. When a thread requires a push, it sets the target and wakes the push thread, then goes to sleep waiting for the required amount of space to become available in the log. This mechanism should also be a lot fairer under heavy load as the waiters will queue in arrival order, rather than queuing in "who completed a push first" order. Also, by moving the pushing to a separate thread we can do more effectively overload detection and prevention as we can keep context from loop iteration to loop iteration. That is, we can push only part of the list each loop and not have to loop back to the start of the list every time we run. This should also help by reducing the number of items we try to lock and/or push items that we cannot move. Note that this patch is not intended to solve the inefficiencies in the AIL structure and the associated issues with extremely large list contents. That needs to be addresses separately; parallel access would cause problems to any new structure as well, so I'm only aiming to isolate the structure from unbounded parallelism here. Signed-Off-By: Dave Chinner --- fs/xfs/linux-2.6/xfs_super.c | 60 +++++++++++ fs/xfs/xfs_log.c | 12 ++ fs/xfs/xfs_mount.c | 6 - fs/xfs/xfs_mount.h | 10 + fs/xfs/xfs_trans.h | 1 fs/xfs/xfs_trans_ail.c | 231 ++++++++++++++++++++++++++++--------------- fs/xfs/xfs_trans_priv.h | 8 + fs/xfs/xfsidbg.c | 12 +- 8 files changed, 247 insertions(+), 93 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_super.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_super.c 2007-11-05 10:39:05.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_super.c 2007-11-05 14:48:39.871177707 +1100 @@ -51,6 +51,7 @@ #include "xfs_vfsops.h" #include "xfs_version.h" #include "xfs_log_priv.h" +#include "xfs_trans_priv.h" #include #include @@ -765,6 +766,65 @@ xfs_blkdev_issue_flush( blkdev_issue_flush(buftarg->bt_bdev, NULL); } +/* + * XFS AIL push thread support + */ +void +xfsaild_wakeup( + xfs_mount_t *mp, + xfs_lsn_t threshold_lsn) +{ + + if (XFS_LSN_CMP(threshold_lsn, mp->m_ail.xa_target) > 0) { + mp->m_ail.xa_target = threshold_lsn; + wake_up_process(mp->m_ail.xa_task); + } +} + +int +xfsaild( + void *data) +{ + xfs_mount_t *mp = (xfs_mount_t *)data; + xfs_lsn_t last_pushed_lsn = 0; + long tout = 0; + + while (!kthread_should_stop()) { + if (tout) + schedule_timeout_interruptible(msecs_to_jiffies(tout)); + + /* swsusp */ + try_to_freeze(); + + /* we're either starting or stopping if there is no log */ + if (!mp->m_log) + continue; + + tout = xfsaild_push(mp, &last_pushed_lsn); + } + + return 0; +} /* xfsaild */ + +void +xfsaild_start( + xfs_mount_t *mp) +{ + mp->m_ail.xa_target = 0; + mp->m_ail.xa_task = kthread_run(xfsaild, mp, "xfsaild"); + ASSERT(!IS_ERR(mp->m_ail.xa_task)); + /* XXX: should return error but nowhere to do it */ +} + +void +xfsaild_stop( + xfs_mount_t *mp) +{ + kthread_stop(mp->m_ail.xa_task); +} + + + STATIC struct inode * xfs_fs_alloc_inode( struct super_block *sb) Index: 2.6.x-xfs-new/fs/xfs/xfs_log.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_log.c 2007-11-02 18:00:19.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_log.c 2007-11-05 14:07:16.850189316 +1100 @@ -515,6 +515,12 @@ xfs_log_mount(xfs_mount_t *mp, mp->m_log = xlog_alloc_log(mp, log_target, blk_offset, num_bblks); /* + * Initialize the AIL now we have a log. + */ + spin_lock_init(&mp->m_ail_lock); + xfs_trans_ail_init(mp); + + /* * skip log recovery on a norecovery mount. pretend it all * just worked. */ @@ -530,7 +536,7 @@ xfs_log_mount(xfs_mount_t *mp, mp->m_flags |= XFS_MOUNT_RDONLY; if (error) { cmn_err(CE_WARN, "XFS: log mount/recovery failed: error %d", error); - xlog_dealloc_log(mp->m_log); + xfs_log_unmount_dealloc(mp); return error; } } @@ -722,10 +728,14 @@ xfs_log_unmount_write(xfs_mount_t *mp) /* * Deallocate log structures for unmount/relocation. + * + * We need to stop the aild from running before we destroy + * and deallocate the log as the aild references the log. */ void xfs_log_unmount_dealloc(xfs_mount_t *mp) { + xfs_trans_ail_destroy(mp); xlog_dealloc_log(mp->m_log); } Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.c 2007-11-02 13:44:50.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_mount.c 2007-11-05 14:12:22.554601173 +1100 @@ -137,15 +137,9 @@ xfs_mount_init(void) mp->m_flags |= XFS_MOUNT_NO_PERCPU_SB; } - spin_lock_init(&mp->m_ail_lock); spin_lock_init(&mp->m_sb_lock); mutex_init(&mp->m_ilock); mutex_init(&mp->m_growlock); - /* - * Initialize the AIL. - */ - xfs_trans_ail_init(mp); - atomic_set(&mp->m_active_trans, 0); return mp; Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.h 2007-10-16 08:52:58.000000000 +1000 +++ 2.6.x-xfs-new/fs/xfs/xfs_mount.h 2007-11-05 14:14:42.652456849 +1100 @@ -219,12 +219,18 @@ extern void xfs_icsb_sync_counters_flags #define xfs_icsb_sync_counters_flags(mp, flags) do { } while (0) #endif +typedef struct xfs_ail { + xfs_ail_entry_t xa_ail; + uint xa_gen; + struct task_struct *xa_task; + xfs_lsn_t xa_target; +} xfs_ail_t; + typedef struct xfs_mount { struct super_block *m_super; xfs_tid_t m_tid; /* next unused tid for fs */ spinlock_t m_ail_lock; /* fs AIL mutex */ - xfs_ail_entry_t m_ail; /* fs active log item list */ - uint m_ail_gen; /* fs AIL generation count */ + xfs_ail_t m_ail; /* fs active log item list */ xfs_sb_t m_sb; /* copy of fs superblock */ spinlock_t m_sb_lock; /* sb counter lock */ struct xfs_buf *m_sb_bp; /* buffer for superblock */ Index: 2.6.x-xfs-new/fs/xfs/xfs_trans.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_trans.h 2007-11-02 13:44:46.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_trans.h 2007-11-05 14:01:13.205272667 +1100 @@ -993,6 +993,7 @@ int _xfs_trans_commit(xfs_trans_t *, #define xfs_trans_commit(tp, flags) _xfs_trans_commit(tp, flags, NULL) void xfs_trans_cancel(xfs_trans_t *, int); void xfs_trans_ail_init(struct xfs_mount *); +void xfs_trans_ail_destroy(struct xfs_mount *); xfs_lsn_t xfs_trans_push_ail(struct xfs_mount *, xfs_lsn_t); xfs_lsn_t xfs_trans_tail_ail(struct xfs_mount *); void xfs_trans_unlocked_item(struct xfs_mount *, Index: 2.6.x-xfs-new/fs/xfs/xfs_trans_ail.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_trans_ail.c 2007-10-02 16:01:48.000000000 +1000 +++ 2.6.x-xfs-new/fs/xfs/xfs_trans_ail.c 2007-11-05 14:46:44.206327966 +1100 @@ -57,7 +57,7 @@ xfs_trans_tail_ail( xfs_log_item_t *lip; spin_lock(&mp->m_ail_lock); - lip = xfs_ail_min(&(mp->m_ail)); + lip = xfs_ail_min(&(mp->m_ail.xa_ail)); if (lip == NULL) { lsn = (xfs_lsn_t)0; } else { @@ -71,25 +71,22 @@ xfs_trans_tail_ail( /* * xfs_trans_push_ail * - * This routine is called to move the tail of the AIL - * forward. It does this by trying to flush items in the AIL - * whose lsns are below the given threshold_lsn. + * This routine is called to move the tail of the AIL forward. It does this by + * trying to flush items in the AIL whose lsns are below the given + * threshold_lsn. * - * The routine returns the lsn of the tail of the log. + * the push is run asynchronously in a separate thread, so we return the tail + * of the log right now instead of the tail after the push. This means we will + * either continue right away, or we will sleep waiting on the async thread to + * do it's work. */ xfs_lsn_t xfs_trans_push_ail( xfs_mount_t *mp, xfs_lsn_t threshold_lsn) { - xfs_lsn_t lsn; xfs_log_item_t *lip; int gen; - int restarts; - int lock_result; - int flush_log; - -#define XFS_TRANS_PUSH_AIL_RESTARTS 1000 spin_lock(&mp->m_ail_lock); lip = xfs_trans_first_ail(mp, &gen); @@ -100,57 +97,105 @@ xfs_trans_push_ail( spin_unlock(&mp->m_ail_lock); return (xfs_lsn_t)0; } + if (XFS_LSN_CMP(threshold_lsn, mp->m_ail.xa_target) > 0) + xfsaild_wakeup(mp, threshold_lsn); + spin_unlock(&mp->m_ail_lock); + return (xfs_lsn_t)lip->li_lsn; +} + +/* + * Return the item in the AIL with the current lsn. + * Return the current tree generation number for use + * in calls to xfs_trans_next_ail(). + */ +STATIC xfs_log_item_t * +xfs_trans_first_push_ail( + xfs_mount_t *mp, + int *gen, + xfs_lsn_t lsn) +{ + xfs_log_item_t *lip; + + lip = xfs_ail_min(&(mp->m_ail.xa_ail)); + *gen = (int)mp->m_ail.xa_gen; + while (lip && (XFS_LSN_CMP(lip->li_lsn, lsn) < 0)) + lip = lip->li_ail.ail_forw; + + return (lip); +} + +/* + * Function that does the work of pushing on the AIL + */ +long +xfsaild_push( + xfs_mount_t *mp, + xfs_lsn_t *last_lsn) +{ + long tout = 100; /* milliseconds */ + xfs_lsn_t last_pushed_lsn = *last_lsn; + xfs_lsn_t target = mp->m_ail.xa_target; + xfs_lsn_t lsn; + xfs_log_item_t *lip; + int lock_result; + int gen; + int restarts; + int flush_log, count, stuck; + +#define XFS_TRANS_PUSH_AIL_RESTARTS 10 + + spin_lock(&mp->m_ail_lock); + lip = xfs_trans_first_push_ail(mp, &gen, *last_lsn); + if (lip == NULL || XFS_FORCED_SHUTDOWN(mp)) { + /* + * AIL is empty or our push has reached the end. + */ + spin_unlock(&mp->m_ail_lock); + last_pushed_lsn = 0; + goto out; + } XFS_STATS_INC(xs_push_ail); /* * While the item we are looking at is below the given threshold - * try to flush it out. Make sure to limit the number of times - * we allow xfs_trans_next_ail() to restart scanning from the - * beginning of the list. We'd like not to stop until we've at least + * try to flush it out. We'd like not to stop until we've at least * tried to push on everything in the AIL with an LSN less than - * the given threshold. However, we may give up before that if - * we realize that we've been holding the AIL lock for 'too long', - * blocking interrupts. Currently, too long is < 500us roughly. + * the given threshold. + * + * However, we will stop after a certain number of pushes and wait + * for a reduced timeout to fire before pushing further. This + * prevents use from spinning when we can't do anything or there is + * lots of contention on the AIL lists. */ - flush_log = 0; - restarts = 0; - while (((restarts < XFS_TRANS_PUSH_AIL_RESTARTS) && - (XFS_LSN_CMP(lip->li_lsn, threshold_lsn) < 0))) { + tout = 10; + lsn = lip->li_lsn; + flush_log = stuck = count = 0; + while ((XFS_LSN_CMP(lip->li_lsn, target) < 0)) { /* - * If we can lock the item without sleeping, unlock - * the AIL lock and flush the item. Then re-grab the - * AIL lock so we can look for the next item on the - * AIL. Since we unlock the AIL while we flush the - * item, the next routine may start over again at the - * the beginning of the list if anything has changed. - * That is what the generation count is for. + * If we can lock the item without sleeping, unlock the AIL + * lock and flush the item. Then re-grab the AIL lock so we + * can look for the next item on the AIL. List changes are + * handled by the AIL lookup functions internally * - * If we can't lock the item, either its holder will flush - * it or it is already being flushed or it is being relogged. - * In any of these case it is being taken care of and we - * can just skip to the next item in the list. + * If we can't lock the item, either its holder will flush it + * or it is already being flushed or it is being relogged. In + * any of these case it is being taken care of and we can just + * skip to the next item in the list. */ lock_result = IOP_TRYLOCK(lip); + spin_unlock(&mp->m_ail_lock); switch (lock_result) { case XFS_ITEM_SUCCESS: - spin_unlock(&mp->m_ail_lock); XFS_STATS_INC(xs_push_ail_success); IOP_PUSH(lip); - spin_lock(&mp->m_ail_lock); + last_pushed_lsn = lsn; break; case XFS_ITEM_PUSHBUF: - spin_unlock(&mp->m_ail_lock); XFS_STATS_INC(xs_push_ail_pushbuf); -#ifdef XFSRACEDEBUG - delay_for_intr(); - delay(300); -#endif - ASSERT(lip->li_ops->iop_pushbuf); - ASSERT(lip); IOP_PUSHBUF(lip); - spin_lock(&mp->m_ail_lock); + last_pushed_lsn = lsn; break; case XFS_ITEM_PINNED: @@ -160,10 +205,14 @@ xfs_trans_push_ail( case XFS_ITEM_LOCKED: XFS_STATS_INC(xs_push_ail_locked); + last_pushed_lsn = lsn; + stuck++; break; case XFS_ITEM_FLUSHING: XFS_STATS_INC(xs_push_ail_flushing); + last_pushed_lsn = lsn; + stuck++; break; default: @@ -171,19 +220,26 @@ xfs_trans_push_ail( break; } + spin_lock(&mp->m_ail_lock); + count++; + /* Too many items we can't do anything with? */ + if (stuck > 100) + break; + /* we're either starting or stopping if there is no log */ + if (!mp->m_log) + break; + /* should we bother continuing? */ + if (XFS_FORCED_SHUTDOWN(mp)) + break; + /* get the next item */ lip = xfs_trans_next_ail(mp, lip, &gen, &restarts); - if (lip == NULL) { + if (lip == NULL) break; - } - if (XFS_FORCED_SHUTDOWN(mp)) { - /* - * Just return if we shut down during the last try. - */ - spin_unlock(&mp->m_ail_lock); - return (xfs_lsn_t)0; - } - + if (restarts > XFS_TRANS_PUSH_AIL_RESTARTS) + break; + lsn = lip->li_lsn; } + spin_unlock(&mp->m_ail_lock); if (flush_log) { /* @@ -191,22 +247,33 @@ xfs_trans_push_ail( * push out the log so it will become unpinned and * move forward in the AIL. */ - spin_unlock(&mp->m_ail_lock); XFS_STATS_INC(xs_push_ail_flush); xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE); - spin_lock(&mp->m_ail_lock); } - lip = xfs_ail_min(&(mp->m_ail)); - if (lip == NULL) { - lsn = (xfs_lsn_t)0; - } else { - lsn = lip->li_lsn; + /* + * We reached the target so wait a bit longer for I/O to complete and + * remove pushed items from the AIL before we start the next scan from + * the start of the AIL. + */ + if ((XFS_LSN_CMP(lsn, target) >= 0)) { + tout += 20; + last_pushed_lsn = 0; + } else if ((restarts > XFS_TRANS_PUSH_AIL_RESTARTS) || + (count && (count < (stuck + 10)))) { + /* + * Either there is a lot of contention on the AIL or we + * found a lot of items we couldn't do anything with. + * Backoff a bit more to allow some I/O to complete before + * continuing from where we were. + */ + tout += 10; } - spin_unlock(&mp->m_ail_lock); - return lsn; -} /* xfs_trans_push_ail */ +out: + *last_lsn = last_pushed_lsn; + return tout; +} /* xfsaild_push */ /* @@ -247,7 +314,7 @@ xfs_trans_unlocked_item( * the call to xfs_log_move_tail() doesn't do anything if there's * not enough free space to wake people up so we're safe calling it. */ - min_lip = xfs_ail_min(&mp->m_ail); + min_lip = xfs_ail_min(&mp->m_ail.xa_ail); if (min_lip == lip) xfs_log_move_tail(mp, 1); @@ -279,7 +346,7 @@ xfs_trans_update_ail( xfs_log_item_t *dlip=NULL; xfs_log_item_t *mlip; /* ptr to minimum lip */ - ailp = &(mp->m_ail); + ailp = &(mp->m_ail.xa_ail); mlip = xfs_ail_min(ailp); if (lip->li_flags & XFS_LI_IN_AIL) { @@ -292,10 +359,10 @@ xfs_trans_update_ail( lip->li_lsn = lsn; xfs_ail_insert(ailp, lip); - mp->m_ail_gen++; + mp->m_ail.xa_gen++; if (mlip == dlip) { - mlip = xfs_ail_min(&(mp->m_ail)); + mlip = xfs_ail_min(&(mp->m_ail.xa_ail)); spin_unlock(&mp->m_ail_lock); xfs_log_move_tail(mp, mlip->li_lsn); } else { @@ -330,7 +397,7 @@ xfs_trans_delete_ail( xfs_log_item_t *mlip; if (lip->li_flags & XFS_LI_IN_AIL) { - ailp = &(mp->m_ail); + ailp = &(mp->m_ail.xa_ail); mlip = xfs_ail_min(ailp); dlip = xfs_ail_delete(ailp, lip); ASSERT(dlip == lip); @@ -338,10 +405,10 @@ xfs_trans_delete_ail( lip->li_flags &= ~XFS_LI_IN_AIL; lip->li_lsn = 0; - mp->m_ail_gen++; + mp->m_ail.xa_gen++; if (mlip == dlip) { - mlip = xfs_ail_min(&(mp->m_ail)); + mlip = xfs_ail_min(&(mp->m_ail.xa_ail)); spin_unlock(&mp->m_ail_lock); xfs_log_move_tail(mp, (mlip ? mlip->li_lsn : 0)); } else { @@ -379,10 +446,10 @@ xfs_trans_first_ail( { xfs_log_item_t *lip; - lip = xfs_ail_min(&(mp->m_ail)); - *gen = (int)mp->m_ail_gen; + lip = xfs_ail_min(&(mp->m_ail.xa_ail)); + *gen = (int)mp->m_ail.xa_gen; - return (lip); + return lip; } /* @@ -402,11 +469,11 @@ xfs_trans_next_ail( xfs_log_item_t *nlip; ASSERT(mp && lip && gen); - if (mp->m_ail_gen == *gen) { - nlip = xfs_ail_next(&(mp->m_ail), lip); + if (mp->m_ail.xa_gen == *gen) { + nlip = xfs_ail_next(&(mp->m_ail.xa_ail), lip); } else { - nlip = xfs_ail_min(&(mp->m_ail)); - *gen = (int)mp->m_ail_gen; + nlip = xfs_ail_min(&(mp->m_ail).xa_ail); + *gen = (int)mp->m_ail.xa_gen; if (restarts != NULL) { XFS_STATS_INC(xs_push_ail_restarts); (*restarts)++; @@ -435,8 +502,16 @@ void xfs_trans_ail_init( xfs_mount_t *mp) { - mp->m_ail.ail_forw = (xfs_log_item_t*)&(mp->m_ail); - mp->m_ail.ail_back = (xfs_log_item_t*)&(mp->m_ail); + mp->m_ail.xa_ail.ail_forw = (xfs_log_item_t*)&mp->m_ail.xa_ail; + mp->m_ail.xa_ail.ail_back = (xfs_log_item_t*)&mp->m_ail.xa_ail; + xfsaild_start(mp); +} + +void +xfs_trans_ail_destroy( + xfs_mount_t *mp) +{ + xfsaild_stop(mp); } /* Index: 2.6.x-xfs-new/fs/xfs/xfs_trans_priv.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_trans_priv.h 2007-10-02 16:01:48.000000000 +1000 +++ 2.6.x-xfs-new/fs/xfs/xfs_trans_priv.h 2007-11-05 14:02:18.784782356 +1100 @@ -57,4 +57,12 @@ struct xfs_log_item *xfs_trans_next_ail( struct xfs_log_item *, int *, int *); +/* + * AIL push thread support + */ +long xfsaild_push(struct xfs_mount *, xfs_lsn_t *); +void xfsaild_wakeup(struct xfs_mount *, xfs_lsn_t); +void xfsaild_start(struct xfs_mount *); +void xfsaild_stop(struct xfs_mount *); + #endif /* __XFS_TRANS_PRIV_H__ */ Index: 2.6.x-xfs-new/fs/xfs/xfsidbg.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfsidbg.c 2007-11-02 13:44:50.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfsidbg.c 2007-11-05 14:50:43.099049624 +1100 @@ -6220,13 +6220,13 @@ xfsidbg_xaildump(xfs_mount_t *mp) }; int count; - if ((mp->m_ail.ail_forw == NULL) || - (mp->m_ail.ail_forw == (xfs_log_item_t *)&mp->m_ail)) { + if ((mp->m_ail.xa_ail.ail_forw == NULL) || + (mp->m_ail.xa_ail.ail_forw == (xfs_log_item_t *)&mp->m_ail.xa_ail)) { kdb_printf("AIL is empty\n"); return; } kdb_printf("AIL for mp 0x%p, oldest first\n", mp); - lip = (xfs_log_item_t*)mp->m_ail.ail_forw; + lip = (xfs_log_item_t*)mp->m_ail.xa_ail.ail_forw; for (count = 0; lip; count++) { kdb_printf("[%d] type %s ", count, xfsidbg_item_type_str(lip)); printflags((uint)(lip->li_flags), li_flags, "flags:"); @@ -6255,7 +6255,7 @@ xfsidbg_xaildump(xfs_mount_t *mp) break; } - if (lip->li_ail.ail_forw == (xfs_log_item_t*)&mp->m_ail) { + if (lip->li_ail.ail_forw == (xfs_log_item_t*)&mp->m_ail.xa_ail) { lip = NULL; } else { lip = lip->li_ail.ail_forw; @@ -6312,9 +6312,9 @@ xfsidbg_xmount(xfs_mount_t *mp) kdb_printf("xfs_mount at 0x%p\n", mp); kdb_printf("tid 0x%x ail_lock 0x%p &ail 0x%p\n", - mp->m_tid, &mp->m_ail_lock, &mp->m_ail); + mp->m_tid, &mp->m_ail_lock, &mp->m_ail.xa_ail); kdb_printf("ail_gen 0x%x &sb 0x%p\n", - mp->m_ail_gen, &mp->m_sb); + mp->m_ail.xa_gen, &mp->m_sb); kdb_printf("sb_lock 0x%p sb_bp 0x%p dev 0x%x logdev 0x%x rtdev 0x%x\n", &mp->m_sb_lock, mp->m_sb_bp, mp->m_ddev_targp ? mp->m_ddev_targp->bt_dev : 0, From owner-xfs@oss.sgi.com Sun Nov 4 21:32:03 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 04 Nov 2007 21:32:09 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA55Vwg1009794 for ; Sun, 4 Nov 2007 21:32:02 -0800 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA18760; Mon, 5 Nov 2007 16:31:57 +1100 Received: by chook.melbourne.sgi.com (Postfix, from userid 16346) id 9887158C38F7; Mon, 5 Nov 2007 16:31:57 +1100 (EST) To: sgi.bugs.xfs@engr.sgi.com Cc: xfs@oss.sgi.com Subject: PARTITAL TAKE 971186 - optimize XFS_IS_REALTIME_INODE w/o realtime config Message-Id: <20071105053157.9887158C38F7@chook.melbourne.sgi.com> Date: Mon, 5 Nov 2007 16:31:57 +1100 (EST) From: dgc@sgi.com (David Chinner) X-Virus-Scanned: ClamAV 0.91.2/4673/Sun Nov 4 14:22:25 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13552 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs optimize XFS_IS_REALTIME_INODE w/o realtime config Use XFS_IS_REALTIME_INODE in more places, and #define it to 0 if CONFIG_XFS_RT is off. This should be safe because mount checks in xfs_rtmount_init: # define xfs_rtmount_init(m) (((mp)->m_sb.sb_rblocks == 0)? 0 : (ENOSYS)) so if we get mounted w/o CONFIG_XFS_RT, no realtime inodes should be encountered after that. Defining XFS_IS_REALTIME_INODE to 0 saves a bit of stack space, presumeably gcc can optimize around the various "if (0)" type checks: xfs_alloc_file_space -8 xfs_bmap_adjacent -16 xfs_bmapi -8 xfs_bmap_rtalloc -16 xfs_bunmapi -28 xfs_free_file_space -64 xfs_imap +8 <-- ? hmm. xfs_iomap_write_direct -12 xfs_qm_dqusage_adjust -4 xfs_qm_vop_chown_reserve -4 Signed-off-by: Eric Sandeen Date: Mon Nov 5 16:31:21 AEDT 2007 Workarea: chook.melbourne.sgi.com:/build/dgc/isms/2.6.x-xfs Inspected by: sandeen@sandeen.net The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb Modid: xfs-linux-melb:xfs-kern:30014a fs/xfs/xfs_rw.h - 1.86 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_rw.h.diff?r1=text&tr1=1.86&r2=text&tr2=1.85&f=h - Use XFS_IS_REALTIME_INODE() rather than open coding the check. fs/xfs/xfs_vnodeops.c - 1.725 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_vnodeops.c.diff?r1=text&tr1=1.725&r2=text&tr2=1.724&f=h - Use XFS_IS_REALTIME_INODE() rather than open coding the check. fs/xfs/xfs_rtalloc.h - 1.30 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_rtalloc.h.diff?r1=text&tr1=1.30&r2=text&tr2=1.29&f=h - Use XFS_IS_REALTIME_INODE() rather than open coding the check. fs/xfs/xfs_dfrag.c - 1.63 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_dfrag.c.diff?r1=text&tr1=1.63&r2=text&tr2=1.62&f=h - Use XFS_IS_REALTIME_INODE() rather than open coding the check. fs/xfs/xfs_bmap_btree.c - 1.167 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_bmap_btree.c.diff?r1=text&tr1=1.167&r2=text&tr2=1.166&f=h - Use XFS_IS_REALTIME_INODE() rather than open coding the check. fs/xfs/xfs_inode.c - 1.486 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_inode.c.diff?r1=text&tr1=1.486&r2=text&tr2=1.485&f=h - Use XFS_IS_REALTIME_INODE() rather than open coding the check. fs/xfs/xfs_bmap.c - 1.381 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_bmap.c.diff?r1=text&tr1=1.381&r2=text&tr2=1.380&f=h - Use XFS_IS_REALTIME_INODE() rather than open coding the check. fs/xfs/xfs_dinode.h - 1.83 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_dinode.h.diff?r1=text&tr1=1.83&r2=text&tr2=1.82&f=h - Use XFS_IS_REALTIME_INODE() rather than open coding the check. fs/xfs/xfs_iomap.c - 1.61 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_iomap.c.diff?r1=text&tr1=1.61&r2=text&tr2=1.60&f=h - Use XFS_IS_REALTIME_INODE() rather than open coding the check. fs/xfs/linux-2.6/xfs_lrw.c - 1.271 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_lrw.c.diff?r1=text&tr1=1.271&r2=text&tr2=1.270&f=h - Use XFS_IS_REALTIME_INODE() rather than open coding the check. fs/xfs/linux-2.6/xfs_ioctl.c - 1.158 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_ioctl.c.diff?r1=text&tr1=1.158&r2=text&tr2=1.157&f=h - Use XFS_IS_REALTIME_INODE() rather than open coding the check. fs/xfs/linux-2.6/xfs_iops.c - 1.269 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_iops.c.diff?r1=text&tr1=1.269&r2=text&tr2=1.268&f=h - Use XFS_IS_REALTIME_INODE() rather than open coding the check. fs/xfs/linux-2.6/xfs_aops.c - 1.158 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_aops.c.diff?r1=text&tr1=1.158&r2=text&tr2=1.157&f=h - Use XFS_IS_REALTIME_INODE() rather than open coding the check. fs/xfs/dmapi/xfs_dm.c - 1.59 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/dmapi/xfs_dm.c.diff?r1=text&tr1=1.59&r2=text&tr2=1.58&f=h - Use XFS_IS_REALTIME_INODE() rather than open coding the check. From owner-xfs@oss.sgi.com Sun Nov 4 23:01:41 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 04 Nov 2007 23:01:51 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from py-out-1112.google.com (py-out-1112.google.com [64.233.166.183]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA571cYT019277 for ; Sun, 4 Nov 2007 23:01:41 -0800 Received: by py-out-1112.google.com with SMTP id u77so3004464pyb for ; Sun, 04 Nov 2007 23:01:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=tph1GPI9/q/B4mKi93htGjseE4k+yn0OOX1KqcZ5qRk=; b=poX06C9iUGzSK3q9s+cmKjJrs5shzGvdxX5PahVPVnyqnUex2SMqQ8VTiwCH6I45UvGiMToac4QF9bLNfVUQ9diL9XNDnS6z+xQfCcGmUzTKZP/xBOJkxz4WSLASIscVNzLFnFuZQxmzPV2HOs8+RJ0Typaf/FQmsRnBOnP3xk4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=TyUw72JHP3WNrEjNA1Shgwnf2+/8KJB3ZoncyKvWMobvWLDuJD30/DFkysdbb215Vu+JZ59AkjI/7BaDSajaBw5IjEl4IaBQuKN6CIPfqc7LGyFRNr51hmAe1GgWnWaOqcK+hsesWfNXr52Oa01oXo6Fx9iZs4nh3b3f/F4TafA= Received: by 10.64.27.13 with SMTP id a13mr12458999qba.1194246102047; Sun, 04 Nov 2007 23:01:42 -0800 (PST) Received: by 10.65.112.13 with HTTP; Sun, 4 Nov 2007 23:01:41 -0800 (PST) Message-ID: <64bb37e0711042301l54f1aca4qc36b184be5caa12b@mail.gmail.com> Date: Mon, 5 Nov 2007 08:01:41 +0100 From: "Torsten Kaiser" To: "David Chinner" Subject: Re: writeout stalls in current -git Cc: "Peter Zijlstra" , "Fengguang Wu" , "Maxim Levitsky" , linux-kernel@vger.kernel.org, "Andrew Morton" , linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com In-Reply-To: <20071105014510.GU66820511@sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <393060478.03650@ustc.edu.cn> <393903856.06449@ustc.edu.cn> <64bb37e0711011120i63cdfe3ci18995d57b6649a8@mail.gmail.com> <64bb37e0711011200n228e708eg255640388f83da22@mail.gmail.com> <1193998532.27652.343.camel@twins> <64bb37e0711021222q7d12c825mc62d433c4fe19e8@mail.gmail.com> <20071102204258.GR995458@sgi.com> <64bb37e0711040319l5de285c3xea64474540a51b6e@mail.gmail.com> <20071105014510.GU66820511@sgi.com> X-Virus-Scanned: ClamAV 0.91.2/4673/Sun Nov 4 14:22:25 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13553 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: just.for.lkml@googlemail.com Precedence: bulk X-list: xfs On 11/5/07, David Chinner wrote: > On Sun, Nov 04, 2007 at 12:19:19PM +0100, Torsten Kaiser wrote: > > I can now confirm, that I see this also with the current mainline-git-version > > I used 2.6.24-rc1-git-b4f555081fdd27d13e6ff39d455d5aefae9d2c0c > > plus the fix for the sg changes in ieee1394. > > Ok, so it's probably a side effect of the writeback changes. > > Attached are two patches (two because one was in a separate patchset as > a standalone change) that should prevent async writeback from blocking > on locked inode cluster buffers. Apply the xfs-factor-inotobp patch first. > Can you see if this fixes the problem? Applied both patches against the kernel mentioned above. This blows up at boot: [ 80.807589] Filesystem "dm-0": Disabling barriers, not supported by the underlying device [ 80.820241] XFS mounting filesystem dm-0 [ 80.913144] ------------[ cut here ]------------ [ 80.914932] kernel BUG at drivers/md/raid5.c:143! [ 80.916751] invalid opcode: 0000 [1] SMP [ 80.918338] CPU 3 [ 80.919142] Modules linked in: [ 80.920345] Pid: 974, comm: md1_raid5 Not tainted 2.6.24-rc1 #3 [ 80.922628] RIP: 0010:[] [] __release_stripe+0x164/0x170 [ 80.925935] RSP: 0018:ffff8100060e7dd0 EFLAGS: 00010002 [ 80.927987] RAX: 0000000000000000 RBX: ffff81010141c288 RCX: 0000000000000000 [ 80.930738] RDX: 0000000000000000 RSI: ffff81010141c288 RDI: ffff810004fb3200 [ 80.933488] RBP: ffff810004fb3200 R08: 0000000000000000 R09: 0000000000000005 [ 80.936240] R10: 0000000000000e00 R11: ffffe200038465e8 R12: ffff81010141c298 [ 80.938990] R13: 0000000000000286 R14: ffff810004fb3330 R15: 0000000000000000 [ 80.941741] FS: 000000000060c870(0000) GS:ffff810100313700(0000) knlGS:0000000000000000 [ 80.944861] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [ 80.947080] CR2: 00007fff7b295000 CR3: 0000000101842000 CR4: 00000000000006e0 [ 80.949830] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 80.952580] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 80.955332] Process md1_raid5 (pid: 974, threadinfo ffff8100060e6000, task ffff81000645c730) [ 80.958584] Stack: ffff81010141c288 00000000000001f4 ffff810004fb3200 ffffffff804b6f2d [ 80.961761] 00000000000001f4 ffff81010141c288 ffffffff804c8bd0 0000000000000000 [ 80.964681] ffff8100060e7ee8 ffffffff804bd094 ffff81000645c730 ffff8100060e7e70 [ 80.967518] Call Trace: [ 80.968558] [] release_stripe+0x3d/0x60 [ 80.970677] [] md_thread+0x0/0x100 [ 80.972629] [] raid5d+0x344/0x450 [ 80.974549] [] process_timeout+0x0/0x10 [ 80.976668] [] schedule_timeout+0x5a/0xd0 [ 80.978855] [] md_thread+0x0/0x100 [ 80.980807] [] md_thread+0x30/0x100 [ 80.982794] [] autoremove_wake_function+0x0/0x30 [ 80.985214] [] md_thread+0x0/0x100 [ 80.987167] [] kthread+0x4b/0x80 [ 80.989054] [] child_rip+0xa/0x12 [ 80.990972] [] kthread+0x0/0x80 [ 80.992824] [] child_rip+0x0/0x12 [ 80.994743] [ 80.995588] [ 80.995588] Code: 0f 0b eb fe 0f 1f 84 00 00 00 00 00 48 83 ec 28 48 89 5c 24 [ 80.999307] RIP [] __release_stripe+0x164/0x170 [ 81.001711] RSP Switching back to unpatched 2.6.23-mm1 boots sucessfull... Torsten From owner-xfs@oss.sgi.com Mon Nov 5 10:56:11 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 05 Nov 2007 10:56:17 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from el-out-1112.google.com (el-out-1112.google.com [209.85.162.178]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA5Iu8EY013384 for ; Mon, 5 Nov 2007 10:56:10 -0800 Received: by el-out-1112.google.com with SMTP id v27so338910ele for ; Mon, 05 Nov 2007 10:56:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=+8MSj2380Q4bne6A0aYLYo6tXpUEmdg3tUONwknfKXE=; b=l4kdhEpjrLp+yVSA0V70Kvdi85cG45TgsFKhUyVwpFYgqz7yaJ6rg89fFmQ4dSmCreLDolmBd8BK/wBIjVMe9ZiltrhkSfGNYrRqhVKU+7begyyoP32vnSCkfjiVNMmyfLFWE7eCL+wSL0uE7p9Qj/wPpDanmoXXtPIGJfVkTts= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=YRi4YBQeIbdpFNafcL72iTJxvnB6Cskq1NRmOBYQQSHVqyGLlAXZ02vzLu5O4LY2X0TWpUkjwgFtmgGIa2aBV+N3eLP1i4f0RmZ8n8bGjFdtpYeW/lupdiADHPl3JKzWn1xcHz0OP+LmqVeloErmZNL6d6PcYIq5kwpyQn+zhd0= Received: by 10.142.201.3 with SMTP id y3mr831842wff.1194287236809; Mon, 05 Nov 2007 10:27:16 -0800 (PST) Received: by 10.65.112.13 with HTTP; Mon, 5 Nov 2007 10:27:16 -0800 (PST) Message-ID: <64bb37e0711051027v49869699s9593ea54713b15ff@mail.gmail.com> Date: Mon, 5 Nov 2007 19:27:16 +0100 From: "Torsten Kaiser" To: "David Chinner" Subject: Re: writeout stalls in current -git Cc: "Peter Zijlstra" , "Fengguang Wu" , "Maxim Levitsky" , linux-kernel@vger.kernel.org, "Andrew Morton" , linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com In-Reply-To: <20071105014510.GU66820511@sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <393060478.03650@ustc.edu.cn> <393903856.06449@ustc.edu.cn> <64bb37e0711011120i63cdfe3ci18995d57b6649a8@mail.gmail.com> <64bb37e0711011200n228e708eg255640388f83da22@mail.gmail.com> <1193998532.27652.343.camel@twins> <64bb37e0711021222q7d12c825mc62d433c4fe19e8@mail.gmail.com> <20071102204258.GR995458@sgi.com> <64bb37e0711040319l5de285c3xea64474540a51b6e@mail.gmail.com> <20071105014510.GU66820511@sgi.com> X-Virus-Scanned: ClamAV 0.91.2/4675/Mon Nov 5 08:20:43 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13554 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: just.for.lkml@googlemail.com Precedence: bulk X-list: xfs On 11/5/07, David Chinner wrote: > Ok, so it's probably a side effect of the writeback changes. > > Attached are two patches (two because one was in a separate patchset as > a standalone change) that should prevent async writeback from blocking > on locked inode cluster buffers. Apply the xfs-factor-inotobp patch first. > Can you see if this fixes the problem? Now testing v2.6.24-rc1-650-gb55d1b1+ the fix for the missapplied raid5-patch Applying your two patches ontop of that does not fix the stalls. vmstat 10 output from unmerging (uninstalling) a kernel: 1 0 0 3512188 332 192644 0 0 185 12 368 735 10 3 85 1 -> emerge starts to remove the kernel source files 3 0 0 3506624 332 192836 0 0 15 9825 2458 8307 7 12 81 0 0 0 0 3507212 332 192836 0 0 0 554 630 1233 0 1 99 0 0 0 0 3507292 332 192836 0 0 0 537 580 1328 0 1 99 0 0 0 0 3507168 332 192836 0 0 0 633 626 1380 0 1 99 0 0 0 0 3507116 332 192836 0 0 0 1510 768 2030 1 2 97 0 0 0 0 3507596 332 192836 0 0 0 524 540 1544 0 0 99 0 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 3507540 332 192836 0 0 0 489 551 1293 0 0 99 0 0 0 0 3507528 332 192836 0 0 0 527 510 1432 1 1 99 0 0 0 0 3508052 332 192840 0 0 0 2088 910 2964 2 3 95 0 0 0 0 3507888 332 192840 0 0 0 442 565 1383 1 1 99 0 0 0 0 3508704 332 192840 0 0 0 497 529 1479 0 0 99 0 0 0 0 3508704 332 192840 0 0 0 594 595 1458 0 0 99 0 0 0 0 3511492 332 192840 0 0 0 2381 1028 2941 2 3 95 0 0 0 0 3510684 332 192840 0 0 0 699 600 1390 0 0 99 0 0 0 0 3511636 332 192840 0 0 0 741 661 1641 0 0 100 0 0 0 0 3524020 332 192840 0 0 0 2452 1080 3910 2 3 95 0 0 0 0 3524040 332 192844 0 0 0 530 617 1297 0 0 99 0 0 0 0 3524128 332 192844 0 0 0 812 674 1667 0 1 99 0 0 0 0 3527000 332 193672 0 0 339 721 754 1681 3 2 93 1 -> emerge is finished, no dirty or writeback data in /proc/meminfo 0 0 0 3571056 332 194768 0 0 111 639 632 1344 0 1 99 0 0 0 0 3571260 332 194768 0 0 0 757 688 1405 1 0 99 0 0 0 0 3571156 332 194768 0 0 0 753 641 1361 0 0 99 0 0 0 0 3571404 332 194768 0 0 0 766 653 1389 0 0 99 0 1 0 0 3571136 332 194768 0 0 6 764 669 1488 0 0 99 0 0 0 0 3571668 332 194824 0 0 0 764 657 1482 0 0 99 0 0 0 0 3571848 332 194824 0 0 0 673 659 1406 0 0 99 0 0 0 0 3571908 332 195052 0 0 22 753 638 1500 0 1 99 0 0 0 0 3573052 332 195052 0 0 0 765 631 1482 0 1 99 0 0 0 0 3574144 332 195052 0 0 0 771 640 1497 0 0 99 0 0 0 0 3573468 332 195052 0 0 0 458 485 1251 0 0 99 0 0 0 0 3574184 332 195052 0 0 0 427 474 1192 0 0 100 0 0 0 0 3575092 332 195052 0 0 0 461 482 1235 0 0 99 0 0 0 0 3576368 332 195056 0 0 0 582 556 1310 0 0 99 0 0 0 0 3579300 332 195056 0 0 0 695 571 1402 0 0 99 0 0 0 0 3580376 332 195056 0 0 0 417 568 906 0 0 99 0 0 0 0 3581212 332 195056 0 0 0 421 559 977 0 1 99 0 0 0 0 3583780 332 195060 0 0 0 494 555 1080 0 1 99 0 0 0 0 3584352 332 195060 0 0 0 99 347 559 0 0 99 0 0 0 0 3585232 332 195060 0 0 0 11 301 621 0 0 99 0 -> disks go idle. So these patches do not seem to be the source of these excessive disk writes... Torsten From owner-xfs@oss.sgi.com Mon Nov 5 11:41:17 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 05 Nov 2007 11:41:25 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=0.9 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE, J_CHICKENPOX_42 autolearn=no version=3.3.0-r574664 Received: from nz-out-0506.google.com (nz-out-0506.google.com [64.233.162.232]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA5JfEdg019444 for ; Mon, 5 Nov 2007 11:41:16 -0800 Received: by nz-out-0506.google.com with SMTP id x3so1002294nzd for ; Mon, 05 Nov 2007 11:41:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; bh=ullirRnf7fsRwZBlH1YzvyFmG4YErAL/OUOqVi6uLS8=; b=IvQ36/C4C6tAVQ2rKuKSRGhgY93wMsz+WeKAR5lk4vtZc683+c/CVz4PTyYNHGteVzJTEMDSFmDELi+NSDmbliSJZUbgBriag9f40qUUB46i7EWhgLPtmlftSv+e2k+nWnUtfiL0iFWCmk0QCPZOcr36EfX7yHYT2BO1rqUw7ug= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; b=qogWKHDaP0qAOLEfPvv/Rb+hXm/gur0LikbahJvvscOwTK+pJQiKjuJpikQ9d+oLGMiBaY7wncddsBV3OOhmSVvNN3qpenqZ7Tk+k40+cv0TjIpZtfSx5RLJ6iVsOztlivtyH9VFpclbxqAq5pNjA6JUckc/1owbi+Q7eNEhZ7U= Received: by 10.142.229.4 with SMTP id b4mr1177219wfh.1194288172123; Mon, 05 Nov 2007 10:42:52 -0800 (PST) Received: by 10.142.162.19 with HTTP; Mon, 5 Nov 2007 10:42:52 -0800 (PST) Message-ID: Date: Tue, 6 Nov 2007 00:12:52 +0530 From: "Bhagi rathi" To: "David Chinner" Subject: Re: TAKE 972756 - Implement fallocate. Cc: sgi.bugs.xfs@engr.sgi.com, xfs@oss.sgi.com In-Reply-To: <20071102024314.9BF3458C38F7@chook.melbourne.sgi.com> MIME-Version: 1.0 References: <20071102024314.9BF3458C38F7@chook.melbourne.sgi.com> X-Virus-Scanned: ClamAV 0.91.2/4675/Mon Nov 5 08:20:43 2007 on oss.sgi.com X-Virus-Status: Clean Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: 7bit Content-length: 1134 X-archive-position: 13555 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jahnu77@gmail.com Precedence: bulk X-list: xfs David, What happens if offset is not aligned to 4k? Let's say we have a file whose size is not aligned to 4k. It could have blocks beyond the eof which haven't been zero'ed out. fallocate may increase the size and we can read garbage from disk-block if it hasn't been zero'ed out. -Thanks, Bhagi. On 11/2/07, David Chinner wrote: > > Implement fallocate. > > Implement the new generic callout for file preallocation. > Atomically change the file size if requested. > > > Date: Fri Nov 2 13:42:52 AEDT 2007 > Workarea: chook.melbourne.sgi.com:/build/dgc/isms/2.6.x-xfs > Inspected by: hch@infradead.org > > The following file(s) were checked into: > longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb > > > Modid: xfs-linux-melb:xfs-kern:30009a > fs/xfs/linux-2.6/xfs_iops.c - 1.268 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/> linux-2.6/xfs_iops.c.diff?r1=text&tr1=1.268&r2=text&tr2=1.267&f=h > > http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_iops.c.diff?r1=text&tr1=1.268&r2=text&tr2=1.267&f=h > - implement ->fallocate() > > > > [[HTML alternate version deleted]] From owner-xfs@oss.sgi.com Mon Nov 5 14:20:27 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 05 Nov 2007 14:20:32 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from spike.grumly.eu.org (spike.grumly.eu.org [195.5.253.226]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA5MKOe3026420 for ; Mon, 5 Nov 2007 14:20:27 -0800 Received: by spike.grumly.eu.org (Postfix, from userid 1001) id E1FD6118F0; Mon, 5 Nov 2007 22:51:35 +0100 (CET) Date: Mon, 5 Nov 2007 22:51:35 +0100 From: Cedric - Equinoxe Media To: xfs@oss.sgi.com Subject: xfs crash Message-ID: <20071105215135.GA12238@e-m.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV 0.91.2/4675/Mon Nov 5 08:20:43 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13556 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: cedric@e-m.fr Precedence: bulk X-list: xfs Hi, I got a crash with xfs serving nfs : Hardware is Dell Poweredge 2950 with RAID5 SAS Linux fng2 2.6.22-3-amd64 #1 SMP Wed Oct 31 13:43:07 UTC 2007 x86_64 GNU/Linux Here is the dmesg : NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory NFSD: starting 90-second grace period XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1563 of file fs/xfs/xfs_alloc.c. Caller 0xffffffff88113b35 Call Trace: [] :xfs:xfs_free_ag_extent+0x1a6/0x6b5 [] :xfs:xfs_free_extent+0xa9/0xc9 [] :xfs:xfs_bmap_finish+0xee/0x167 [] :xfs:xfs_itruncate_finish+0x19b/0x2e0 [] :xfs:xfs_setattr+0x841/0xe57 [] __mod_timer+0xc3/0xd3 [] task_rq_lock+0x3d/0x6f [] __activate_task+0x26/0x38 [] :xfs:xfs_vn_setattr+0x121/0x144 [] notify_change+0x156/0x2f1 [] :nfsd:nfsd_setattr+0x334/0x4b1 [] :nfsd:nfsd3_proc_setattr+0xa2/0xae [] :nfsd:nfsd_dispatch+0xdd/0x19e [] :sunrpc:svc_process+0x3df/0x6ef [] __down_read+0x12/0x9a [] :nfsd:nfsd+0x191/0x2ac [] child_rip+0xa/0x12 [] :nfsd:nfsd+0x0/0x2ac [] child_rip+0x0/0x12 xfs_force_shutdown(sda4,0x8) called from line 4258 of file fs/xfs/xfs_bmap.c. Return address = 0xffffffff8811cfb4 Filesystem "sda4": Corruption of in-memory data detected. Shutting down filesystem: sda4 Please umount the filesystem, and rectify the problem(s) nfsd: non-standard errno: -117 ---------------- Here I stopped nfs, umount -f /dev/sda4, mount /dev/sda4 then start nfs again. ---------------- nfsd: last server has exited nfsd: unexporting all filesystems xfs_force_shutdown(sda4,0x1) called from line 423 of file fs/xfs/xfs_rw.c. Return address = 0xffffffff88158289 xfs_force_shutdown(sda4,0x1) called from line 423 of file fs/xfs/xfs_rw.c. Return address = 0xffffffff88158289 XFS mounting filesystem sda4 Starting XFS recovery on filesystem: sda4 (logdev: internal) XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1563 of file fs/xfs/xfs_alloc.c. Caller 0xffffffff88113b35 Call Trace: [] :xfs:xfs_free_ag_extent+0x1a6/0x6b5 [] :xfs:xfs_free_extent+0xa9/0xc9 [] :xfs:xlog_recover_process_efi+0xf7/0x12a [] :xfs:xlog_recover_process_efis+0x4f/0x81 [] :xfs:xlog_recover_finish+0x19/0x9a [] :xfs:xfs_mountfs+0x83d/0x91b [] _atomic_dec_and_lock+0x39/0x58 [] :xfs:xfs_mount+0x317/0x39d [] :xfs:xfs_fs_fill_super+0x0/0x1a7 [] :xfs:xfs_fs_fill_super+0x7e/0x1a7 [] __down_write_nested+0x12/0x9a [] get_filesystem+0x12/0x35 [] sget+0x39d/0x3af [] set_bdev_super+0x0/0xf [] test_bdev_super+0x0/0xd [] get_sb_bdev+0x105/0x152 [] vfs_kern_mount+0x93/0x11a [] do_kern_mount+0x43/0xdd [] do_mount+0x691/0x708 [] mntput_no_expire+0x1c/0x94 [] link_path_walk+0xce/0xe0 [] activate_page+0xad/0xd4 [] find_get_page+0x21/0x50 [] filemap_nopage+0x180/0x2ab [] __handle_mm_fault+0x3e6/0x9d9 [] zone_statistics+0x3f/0x60 [] __up_read+0x13/0x8a [] __alloc_pages+0x5a/0x2bc [] sys_mount+0x8a/0xd7 [] system_call+0x7e/0x83 Ending XFS recovery on filesystem: sda4 (logdev: internal) NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory NFSD: starting 90-second grace period I have no other message in the dmesg, the server has latest RAID firmware from dell. Cédric. From owner-xfs@oss.sgi.com Mon Nov 5 16:12:34 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 05 Nov 2007 16:12:36 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_42 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA60CTqv007559 for ; Mon, 5 Nov 2007 16:12:33 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA12377; Tue, 6 Nov 2007 11:12:26 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lA60COdD94499571; Tue, 6 Nov 2007 11:12:25 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lA60CNUj95451472; Tue, 6 Nov 2007 11:12:23 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Tue, 6 Nov 2007 11:12:23 +1100 From: David Chinner To: Bhagi rathi Cc: David Chinner , xfs@oss.sgi.com Subject: Re: TAKE 972756 - Implement fallocate. Message-ID: <20071106001223.GY66820511@sgi.com> References: <20071102024314.9BF3458C38F7@chook.melbourne.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4676/Mon Nov 5 13:20:22 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13557 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Tue, Nov 06, 2007 at 12:12:52AM +0530, Bhagi rathi wrote: > David, What happens if offset is not aligned to 4k? Let's say we have a file > whose size is > not aligned to 4k. It could have blocks beyond the eof which haven't been > zero'ed out. No it won't. They are *preallocated* blocks, which by definition are zero-filled. Preallocated blocks are marked as unwritten on disk, so it is known that they contain zeros, even if they lie beyond EOF. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Mon Nov 5 20:25:51 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 05 Nov 2007 20:25:55 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA64PmG0016534 for ; Mon, 5 Nov 2007 20:25:50 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA16934; Tue, 6 Nov 2007 15:25:39 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lA64PZdD95507918; Tue, 6 Nov 2007 15:25:36 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lA64PRU691691920; Tue, 6 Nov 2007 15:25:27 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Tue, 6 Nov 2007 15:25:27 +1100 From: David Chinner To: Torsten Kaiser Cc: David Chinner , Peter Zijlstra , Fengguang Wu , Maxim Levitsky , linux-kernel@vger.kernel.org, Andrew Morton , linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com Subject: Re: writeout stalls in current -git Message-ID: <20071106042527.GT995458@sgi.com> References: <393903856.06449@ustc.edu.cn> <64bb37e0711011120i63cdfe3ci18995d57b6649a8@mail.gmail.com> <64bb37e0711011200n228e708eg255640388f83da22@mail.gmail.com> <1193998532.27652.343.camel@twins> <64bb37e0711021222q7d12c825mc62d433c4fe19e8@mail.gmail.com> <20071102204258.GR995458@sgi.com> <64bb37e0711040319l5de285c3xea64474540a51b6e@mail.gmail.com> <20071105014510.GU66820511@sgi.com> <64bb37e0711051027v49869699s9593ea54713b15ff@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <64bb37e0711051027v49869699s9593ea54713b15ff@mail.gmail.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4678/Mon Nov 5 17:20:26 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13558 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Mon, Nov 05, 2007 at 07:27:16PM +0100, Torsten Kaiser wrote: > On 11/5/07, David Chinner wrote: > > Ok, so it's probably a side effect of the writeback changes. > > > > Attached are two patches (two because one was in a separate patchset as > > a standalone change) that should prevent async writeback from blocking > > on locked inode cluster buffers. Apply the xfs-factor-inotobp patch first. > > Can you see if this fixes the problem? > > Now testing v2.6.24-rc1-650-gb55d1b1+ the fix for the missapplied raid5-patch > Applying your two patches ontop of that does not fix the stalls. So you are having RAID5 problems as well? I'm struggling to understand what possible changed in XFS or writeback that would lead to stalls like this, esp. as you appear to be removing files when the stalls occur. Rather than vmstat, can you use something like iostat to show how busy your disks are? i.e. are we seeing RMW cycles in the raid5 or some such issue. OOC, what is the 'xfs_info ' output for your filesystem? > vmstat 10 output from unmerging (uninstalling) a kernel: > 1 0 0 3512188 332 192644 0 0 185 12 368 735 10 3 85 1 > -> emerge starts to remove the kernel source files > 3 0 0 3506624 332 192836 0 0 15 9825 2458 8307 7 12 81 0 > 0 0 0 3507212 332 192836 0 0 0 554 630 1233 0 1 99 0 > 0 0 0 3507292 332 192836 0 0 0 537 580 1328 0 1 99 0 > 0 0 0 3507168 332 192836 0 0 0 633 626 1380 0 1 99 0 > 0 0 0 3507116 332 192836 0 0 0 1510 768 2030 1 2 97 0 > 0 0 0 3507596 332 192836 0 0 0 524 540 1544 0 0 99 0 > procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy id wa > 0 0 0 3507540 332 192836 0 0 0 489 551 1293 0 0 99 0 > 0 0 0 3507528 332 192836 0 0 0 527 510 1432 1 1 99 0 > 0 0 0 3508052 332 192840 0 0 0 2088 910 2964 2 3 95 0 > 0 0 0 3507888 332 192840 0 0 0 442 565 1383 1 1 99 0 > 0 0 0 3508704 332 192840 0 0 0 497 529 1479 0 0 99 0 > 0 0 0 3508704 332 192840 0 0 0 594 595 1458 0 0 99 0 > 0 0 0 3511492 332 192840 0 0 0 2381 1028 2941 2 3 95 0 > 0 0 0 3510684 332 192840 0 0 0 699 600 1390 0 0 99 0 > 0 0 0 3511636 332 192840 0 0 0 741 661 1641 0 0 100 0 > 0 0 0 3524020 332 192840 0 0 0 2452 1080 3910 2 3 95 0 > 0 0 0 3524040 332 192844 0 0 0 530 617 1297 0 0 99 0 > 0 0 0 3524128 332 192844 0 0 0 812 674 1667 0 1 99 0 > 0 0 0 3527000 332 193672 0 0 339 721 754 1681 3 2 93 1 > -> emerge is finished, no dirty or writeback data in /proc/meminfo At this point, can you run a "sync" and see how long that takes to complete? The only thing I can think that woul dbe written out after this point is inodes, but even then it seems to go on for a long, long time and it really doesn't seem like XFS is holding up the inode writes. Another option is to use blktrace/blkparse to determine which process is issuing this I/O. > 0 0 0 3583780 332 195060 0 0 0 494 555 1080 0 1 99 0 > 0 0 0 3584352 332 195060 0 0 0 99 347 559 0 0 99 0 > 0 0 0 3585232 332 195060 0 0 0 11 301 621 0 0 99 0 > -> disks go idle. > > So these patches do not seem to be the source of these excessive disk writes... Well, the patches I posted should prevent blocking in the places that it was seen, so if that does not stop the slowdowns then either the writeback code is not feeding us inodes fast enough or the block device below is having some kind of problem.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Mon Nov 5 22:48:02 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 05 Nov 2007 22:48:08 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: * X-Spam-Status: No, score=1.0 required=5.0 tests=ANY_BOUNCE_MESSAGE,AWL, BAYES_50,VBOUNCE_MESSAGE autolearn=no version=3.3.0-r574664 Received: from omr-m23.mx.aol.com (omr-m23.mx.aol.com [64.12.136.131]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA66luQv000338 for ; Mon, 5 Nov 2007 22:48:01 -0800 Received: from rly-dd09.mx.aol.com (rly-dd09.mx.aol.com [205.188.156.246]) by omr-m23.mx.aol.com (v117.7) with ESMTP id MAILOMRM233-7dfe47300e1e30; Tue, 06 Nov 2007 01:47:58 -0400 Received: from localhost (localhost) by rly-dd09.mx.aol.com (8.14.1/8.14.1) id lA66lnYC024722; Tue, 6 Nov 2007 01:47:58 -0500 Date: Tue, 6 Nov 2007 01:47:58 -0500 From: Mail Delivery Subsystem Message-Id: <200711060647.lA66lnYC024722@rly-dd09.mx.aol.com> To: MIME-Version: 1.0 Content-Type: multipart/report; report-type=delivery-status; boundary="lA66lnYC024722.1194331678/rly-dd09.mx.aol.com" Subject: Returned mail: see transcript for details Auto-Submitted: auto-generated (failure) X-AOL-INRLY: smtp1.wanadoo.jo [193.252.22.182] rly-dd09 X-AOL-IP: 205.188.156.246 X-Virus-Scanned: ClamAV 0.91.2/4680/Mon Nov 5 20:49:40 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13559 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: MAILER-DAEMON@aol.com Precedence: bulk X-list: xfs This is a MIME-encapsulated message --lA66lnYC024722.1194331678/rly-dd09.mx.aol.com The original message was received at Tue, 6 Nov 2007 01:47:44 -0500 from smtp1.wanadoo.jo [193.252.22.182] *** ATTENTION *** Your e-mail is being returned to you because there was a problem with its delivery. The address which was undeliverable is listed in the section labeled: "----- The following addresses had permanent fatal errors -----". The reason your mail is being returned to you is listed in the section labeled: "----- Transcript of Session Follows -----". The line beginning with "<<<" describes the specific reason your e-mail could not be delivered. The next line contains a second error message which is a general translation for other e-mail servers. Please direct further questions regarding this message to your e-mail administrator. --AOL Postmaster ----- The following addresses had permanent fatal errors ----- (reason: 554 TRANSACTION FAILED - Unrepairable Virus Detected. Your mail has not been sent.) ----- Transcript of session follows ----- ... while talking to air-dd02.mail.aol.com.: >>> DATA <<< 554 TRANSACTION FAILED - Unrepairable Virus Detected. Your mail has not been sent. 554 5.0.0 Service unavailable --lA66lnYC024722.1194331678/rly-dd09.mx.aol.com Content-Type: message/delivery-status Reporting-MTA: dns; rly-dd09.mx.aol.com Arrival-Date: Tue, 6 Nov 2007 01:47:44 -0500 Final-Recipient: RFC822; docsbnb@aol.com Action: failed Status: 5.0.0 Remote-MTA: DNS; air-dd02.mail.aol.com Diagnostic-Code: SMTP; 554 TRANSACTION FAILED - Unrepairable Virus Detected. Your mail has not been sent. Last-Attempt-Date: Tue, 6 Nov 2007 01:47:58 -0500 --lA66lnYC024722.1194331678/rly-dd09.mx.aol.com Content-Type: text/rfc822-headers Received: from mwinf4011.affiliated.me-wanadoo.net (smtp1.wanadoo.jo [193.252.22.182]) by rly-dd09.mx.aol.com (v120.9) with ESMTP id MAILRELAYINDD095-b9b47300e0e11c; Tue, 06 Nov 2007 01:47:43 -0400 Received: from oss.sgi.com (unknown [86.108.50.217]) by mwinf4011.affiliated.me-wanadoo.net (SMTP Server) with ESMTP id B063A1C0027E for ; Tue, 6 Nov 2007 07:47:38 +0100 (CET) X-ME-UUID: 20071106064738722.B063A1C0027E@mwinf4011.affiliated.me-wanadoo.net X-ME-bounce-domain: orange.jo From: linux-xfs@oss.sgi.com To: docsbnb@aol.com Subject: Docsbnb@aol.com Date: Mon, 5 Nov 2007 22:47:19 -0800 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0007_8AA0D621.F6E57592" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2600.0000 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Message-Id: <20071106064738.B063A1C0027E@mwinf4011.affiliated.me-wanadoo.net> X-AOL-IP: 193.252.22.182 X-AOL-SCOLL-SCORE:0:2:400697280:9395240 X-AOL-SCOLL-URL_COUNT: X-AOL-SCOLL-AUTHENTICATION: listenair ; SPF_helo : X-AOL-SCOLL-AUTHENTICATION: listenair ; SPF_822_from : --lA66lnYC024722.1194331678/rly-dd09.mx.aol.com-- From owner-xfs@oss.sgi.com Mon Nov 5 23:10:20 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 05 Nov 2007 23:10:25 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_51 autolearn=no version=3.3.0-r574664 Received: from py-out-1112.google.com (py-out-1112.google.com [64.233.166.176]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA67AIrt002874 for ; Mon, 5 Nov 2007 23:10:20 -0800 Received: by py-out-1112.google.com with SMTP id u77so3727623pyb for ; Mon, 05 Nov 2007 23:10:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=EW1c170Cm1YvLTbkhmS2rWCfzevYC/msaicm3fzL84g=; b=WfqbkELWbx8eW7EW37gMiSpe89OHtBWvLgpdhuZNQM3VB1UCWI/2oIwv+AuPCeWoMun6jVxj1T3N1EumlFMF/T5pQivS1BMGr02htqeu9eLJw0qoWkN50hvncQXQffdRGa5YufUTgF/ZViVGh1/D2HpF1fZP2jTr3WwL7yzRYLQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=GuKXd6gsRcXZUYdU3CtPQCkSJj73Jcl51yEV8Y4SGrwjFmZp2f64OKNvd6ra+B+xXqRNR165tNLLVCk59SxmKI0G3t/gw5Cqzansdlrw1+AKUTOOk2Czetu+omd3xzBBguIKriWUNOiUo3cDv/yeXRDeiAPJlf3Qnj57xVVKkK4= Received: by 10.65.153.10 with SMTP id f10mr9442879qbo.1194333021770; Mon, 05 Nov 2007 23:10:21 -0800 (PST) Received: by 10.65.112.13 with HTTP; Mon, 5 Nov 2007 23:10:21 -0800 (PST) Message-ID: <64bb37e0711052310r5214cf50nd148d989524490ea@mail.gmail.com> Date: Tue, 6 Nov 2007 08:10:21 +0100 From: "Torsten Kaiser" To: "David Chinner" Subject: Re: writeout stalls in current -git Cc: "Peter Zijlstra" , "Fengguang Wu" , "Maxim Levitsky" , linux-kernel@vger.kernel.org, "Andrew Morton" , linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com In-Reply-To: <20071106042527.GT995458@sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <393903856.06449@ustc.edu.cn> <64bb37e0711011200n228e708eg255640388f83da22@mail.gmail.com> <1193998532.27652.343.camel@twins> <64bb37e0711021222q7d12c825mc62d433c4fe19e8@mail.gmail.com> <20071102204258.GR995458@sgi.com> <64bb37e0711040319l5de285c3xea64474540a51b6e@mail.gmail.com> <20071105014510.GU66820511@sgi.com> <64bb37e0711051027v49869699s9593ea54713b15ff@mail.gmail.com> <20071106042527.GT995458@sgi.com> X-Virus-Scanned: ClamAV 0.91.2/4680/Mon Nov 5 20:49:40 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13560 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: just.for.lkml@googlemail.com Precedence: bulk X-list: xfs On 11/6/07, David Chinner wrote: > On Mon, Nov 05, 2007 at 07:27:16PM +0100, Torsten Kaiser wrote: > > On 11/5/07, David Chinner wrote: > > > Ok, so it's probably a side effect of the writeback changes. > > > > > > Attached are two patches (two because one was in a separate patchset as > > > a standalone change) that should prevent async writeback from blocking > > > on locked inode cluster buffers. Apply the xfs-factor-inotobp patch first. > > > Can you see if this fixes the problem? > > > > Now testing v2.6.24-rc1-650-gb55d1b1+ the fix for the missapplied raid5-patch > > Applying your two patches ontop of that does not fix the stalls. > > So you are having RAID5 problems as well? The first 2.6.24-rc1-git-kernel that I patched with your patches did not boot for me. (Oops send in one of my previous mails) But given that the stacktrace was not xfs related and I had seen this patch on the lkml, I tried to fix this Oops this way. I did not have troubles with the RAID5 otherwise. > I'm struggling to understand what possible changed in XFS or writeback that > would lead to stalls like this, esp. as you appear to be removing files when > the stalls occur. Rather than vmstat, can you use something like iostat to > show how busy your disks are? i.e. are we seeing RMW cycles in the raid5 or > some such issue. Will do this this evening. > OOC, what is the 'xfs_info ' output for your filesystem? meta-data=/dev/mapper/root isize=256 agcount=32, agsize=4731132 blks = sectsz=512 attr=1 data = bsize=4096 blocks=151396224, imaxpct=25 = sunit=0 swidth=0 blks, unwritten=1 naming =version 2 bsize=4096 log =internal bsize=4096 blocks=32768, version=1 = sectsz=512 sunit=0 blks, lazy-count=0 realtime =none extsz=4096 blocks=0, rtextents=0 > > vmstat 10 output from unmerging (uninstalling) a kernel: > > 1 0 0 3512188 332 192644 0 0 185 12 368 735 10 3 85 1 > > -> emerge starts to remove the kernel source files > > 3 0 0 3506624 332 192836 0 0 15 9825 2458 8307 7 12 81 0 > > 0 0 0 3507212 332 192836 0 0 0 554 630 1233 0 1 99 0 > > 0 0 0 3507292 332 192836 0 0 0 537 580 1328 0 1 99 0 > > 0 0 0 3507168 332 192836 0 0 0 633 626 1380 0 1 99 0 > > 0 0 0 3507116 332 192836 0 0 0 1510 768 2030 1 2 97 0 > > 0 0 0 3507596 332 192836 0 0 0 524 540 1544 0 0 99 0 > > procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- > > r b swpd free buff cache si so bi bo in cs us sy id wa > > 0 0 0 3507540 332 192836 0 0 0 489 551 1293 0 0 99 0 > > 0 0 0 3507528 332 192836 0 0 0 527 510 1432 1 1 99 0 > > 0 0 0 3508052 332 192840 0 0 0 2088 910 2964 2 3 95 0 > > 0 0 0 3507888 332 192840 0 0 0 442 565 1383 1 1 99 0 > > 0 0 0 3508704 332 192840 0 0 0 497 529 1479 0 0 99 0 > > 0 0 0 3508704 332 192840 0 0 0 594 595 1458 0 0 99 0 > > 0 0 0 3511492 332 192840 0 0 0 2381 1028 2941 2 3 95 0 > > 0 0 0 3510684 332 192840 0 0 0 699 600 1390 0 0 99 0 > > 0 0 0 3511636 332 192840 0 0 0 741 661 1641 0 0 100 0 > > 0 0 0 3524020 332 192840 0 0 0 2452 1080 3910 2 3 95 0 > > 0 0 0 3524040 332 192844 0 0 0 530 617 1297 0 0 99 0 > > 0 0 0 3524128 332 192844 0 0 0 812 674 1667 0 1 99 0 > > 0 0 0 3527000 332 193672 0 0 339 721 754 1681 3 2 93 1 > > -> emerge is finished, no dirty or writeback data in /proc/meminfo > > At this point, can you run a "sync" and see how long that takes to > complete? Already tried that: http://lkml.org/lkml/2007/11/2/178 See the logs from the second unmerge in the second half of the mail. The sync did not stop this writeout, but returned immediately. > The only thing I can think that woul dbe written out after > this point is inodes, but even then it seems to go on for a long, > long time and it really doesn't seem like XFS is holding up the > inode writes. Yes, I completly agree that this is much to long. Thats why I included the after-emerge-finished parts of the logs. But I still partly suspect xfs, because the xfssyncd shows up when I hip SysRq+W. > Another option is to use blktrace/blkparse to determine which process is > issuing this I/O. > > > 0 0 0 3583780 332 195060 0 0 0 494 555 1080 0 1 99 0 > > 0 0 0 3584352 332 195060 0 0 0 99 347 559 0 0 99 0 > > 0 0 0 3585232 332 195060 0 0 0 11 301 621 0 0 99 0 > > -> disks go idle. > > > > So these patches do not seem to be the source of these excessive disk writes... > > Well, the patches I posted should prevent blocking in the places that it > was seen, so if that does not stop the slowdowns then either the writeback > code is not feeding us inodes fast enough or the block device below is > having some kind of problem.... I don't think its the block device, because reading/writing larger files do not seem to be troubled. It looks much more like an inode problem. For example both installing and uninstalling kernel source trees show these stalls, but during uninstalling this is much more noticeable. But I agree that this might not be xfs specific, as this showed up at the same time as other people started reporting about the 100% iowait bug. Could be that this is the same bug and the differences between reiserfs and xfs might explain the iowait vs. idle. Or that I don't see the 100% iowait is something else on my system... Torsten From owner-xfs@oss.sgi.com Tue Nov 6 01:18:53 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 06 Nov 2007 01:19:17 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=0.4 required=5.0 tests=AWL,BAYES_50,J_CHICKENPOX_21, J_CHICKENPOX_23,J_CHICKENPOX_31,J_CHICKENPOX_42,J_CHICKENPOX_43, J_CHICKENPOX_44,J_CHICKENPOX_45,J_CHICKENPOX_46,J_CHICKENPOX_47, J_CHICKENPOX_48,J_CHICKENPOX_61,J_CHICKENPOX_62,J_CHICKENPOX_63, J_CHICKENPOX_64,J_CHICKENPOX_65,J_CHICKENPOX_73 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA69Ihsn002714 for ; Tue, 6 Nov 2007 01:18:45 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id UAA21882; Tue, 6 Nov 2007 20:18:38 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lA69IbdD95551378; Tue, 6 Nov 2007 20:18:38 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lA69Ia1s95104290; Tue, 6 Nov 2007 20:18:36 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Tue, 6 Nov 2007 20:18:36 +1100 From: David Chinner To: xfs-dev Cc: xfs-oss Subject: [PATCH,RFC] Factor some btree code.... Message-ID: <20071106091836.GV995458@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4680/Mon Nov 5 20:49:40 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13562 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Only a small patch. ;) basically, I need to introduce new formats to some of the btree block fields (crc, uuids) for resilience and recovery purposes. Rather than have to copy large chunks of three separate btree implementations, I decided that I'd factor them into one implementation first. The approach I took was to build a bunch of ops structures taht each different btree structure could implement. basically, all the btrees do the same fundamental operations, so it shoul dbe easy to do. Right? I've formed a "btree core" set of functions that operation on: xfs_btree_block_t - a generic btree block xfs_btree_key_t - a union of the different key types xfs_btree_ptr_t - a union of the different pointer types xfs_btree_rec_t - a union of the different record types These are passed around the core btree code in disk endian format and the callouts convert to/from disk endian format as needed. there are operations for intialising keys, ptrs and records from either the cursor or other keys or records. There are operations for moving them, getting the address within the block of a given index within a block, logging the changes made etc. There's various block operations e.g. allocating and freeing blocks, logging block headers, etc in a separate ops structure. Some of the remaining operations are lumped into a "cursor ops" structure - I think I'll probably fold them back into the block ops structure, or even just make it one large ops structure for everything - there's really no need for multiple ops structures, except for.... ... the btree tracing code. I haven't completed that yet, but the btree core inherits the tracing code from the bmap btree code, so we'll have fined grained tracing on all btree operations once this is complete. The core btree code also got factored and commenting was improved; the result is that the code is now readable and understandable, which it certainly wasn't before I began this. A further feature is that the core btree code now supports the btree root being placed in an inode. I still need to move the extent format code into the core as well as some of the root manipulation code, but in future the only difference between a pointer rooted btree (eg freespace trees) and an inode rooted btree (inode extent btree) will be a single flag being set in during the btree cursor initialisation. The result of all this is a massive patch that cleans up a lot of stuff, introduces new functionality into the btree code and reduces each btree implementation down to a relatively simple set of operations to write. The freespace btrees (xfs_alloc_btree.c) have gone from 2200 lines to 900, the inode btree (xfs_ialloc_btree.c) has gone from 2000 lines to less than 800, and the bmap btree has gone from ~2600 lines to ~1400. There's probably more this can be reduced as well. On top of this, modifying the btree structures will now involve writing only a handful of new functions to be written instead of duplicating most of those three files mentioned above. The next question - does it work? Well, apart from test 042 (massively fragmented file and freespace btree) and occasional 013 (fstress) and 083 (fstress @ ENOSPC) corruptions, it runs fine. Indeed, I just did an apt-get update that replaced about 500MB of the binaries on the root drive of my test box, updated a git tree and rebuilt a kernel and the filesystem survived that just fine. So, while I would not recommend it for production yet, it's definitely usable. The probelms remaining stem from level 3 btrees and larger, and I need the btree tracing code working to trace those problems (it doesn't work yet). There's plenty still to clean up in the patch, but I thought that pushing it out early for comment would be better than leaving it until I had everything working. Thoughts, comments, flames? (Eric, I'm looking at you and your 3-way diffstats ;) Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group --- fs/xfs/xfs.h | 2 fs/xfs/xfs_alloc.c | 48 fs/xfs/xfs_alloc_btree.c | 2611 +++++++++--------------------------- fs/xfs/xfs_alloc_btree.h | 2 fs/xfs/xfs_bmap.c | 58 fs/xfs/xfs_bmap_btree.c | 3307 ++++++++++++++-------------------------------- fs/xfs/xfs_bmap_btree.h | 8 fs/xfs/xfs_btree.c | 351 +++- fs/xfs/xfs_btree.h | 419 +++++ fs/xfs/xfs_btree_core.c | 2299 +++++++++++++++++++++++++++++++ fs/xfs/xfs_btree_trace.c | 202 ++ fs/xfs/xfs_ialloc.c | 24 fs/xfs/xfs_ialloc_btree.c | 2399 +++++++-------------------------- fs/xfs/xfs_ialloc_btree.h | 2 fs/xfs/xfs_itable.c | 6 15 files changed, 5497 insertions(+), 6241 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/xfs.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs.h 2007-09-12 15:41:22.000000000 +1000 +++ 2.6.x-xfs-new/fs/xfs/xfs.h 2007-11-06 19:40:29.694676106 +1100 @@ -30,7 +30,7 @@ #define XFS_ATTR_TRACE 1 #define XFS_BLI_TRACE 1 #define XFS_BMAP_TRACE 1 -#define XFS_BMBT_TRACE 1 +#define XFS_BTREE_TRACE 1 #define XFS_DIR2_TRACE 1 #define XFS_DQUOT_TRACE 1 #define XFS_ILOCK_TRACE 1 Index: 2.6.x-xfs-new/fs/xfs/xfs_alloc.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_alloc.c 2007-10-16 08:52:58.000000000 +1000 +++ 2.6.x-xfs-new/fs/xfs/xfs_alloc.c 2007-11-06 19:40:29.694676106 +1100 @@ -334,7 +334,7 @@ xfs_alloc_fixup_trees( /* * Delete the entry from the by-size btree. */ - if ((error = xfs_alloc_delete(cnt_cur, &i))) + if ((error = xfs_btree_delete(cnt_cur, &i))) return error; XFS_WANT_CORRUPTED_RETURN(i == 1); /* @@ -344,7 +344,7 @@ xfs_alloc_fixup_trees( if ((error = xfs_alloc_lookup_eq(cnt_cur, nfbno1, nflen1, &i))) return error; XFS_WANT_CORRUPTED_RETURN(i == 0); - if ((error = xfs_alloc_insert(cnt_cur, &i))) + if ((error = xfs_btree_insert(cnt_cur, &i))) return error; XFS_WANT_CORRUPTED_RETURN(i == 1); } @@ -352,7 +352,7 @@ xfs_alloc_fixup_trees( if ((error = xfs_alloc_lookup_eq(cnt_cur, nfbno2, nflen2, &i))) return error; XFS_WANT_CORRUPTED_RETURN(i == 0); - if ((error = xfs_alloc_insert(cnt_cur, &i))) + if ((error = xfs_btree_insert(cnt_cur, &i))) return error; XFS_WANT_CORRUPTED_RETURN(i == 1); } @@ -363,7 +363,7 @@ xfs_alloc_fixup_trees( /* * No remaining freespace, just delete the by-block tree entry. */ - if ((error = xfs_alloc_delete(bno_cur, &i))) + if ((error = xfs_btree_delete(bno_cur, &i))) return error; XFS_WANT_CORRUPTED_RETURN(i == 1); } else { @@ -380,7 +380,7 @@ xfs_alloc_fixup_trees( if ((error = xfs_alloc_lookup_eq(bno_cur, nfbno2, nflen2, &i))) return error; XFS_WANT_CORRUPTED_RETURN(i == 0); - if ((error = xfs_alloc_insert(bno_cur, &i))) + if ((error = xfs_btree_insert(bno_cur, &i))) return error; XFS_WANT_CORRUPTED_RETURN(i == 1); } @@ -819,7 +819,7 @@ xfs_alloc_ag_vextent_near( XFS_WANT_CORRUPTED_GOTO(i == 1, error0); if (ltlen >= args->minlen) break; - if ((error = xfs_alloc_increment(cnt_cur, 0, &i))) + if ((error = xfs_btree_increment(cnt_cur, 0, &i))) goto error0; } while (i); ASSERT(ltlen >= args->minlen); @@ -829,7 +829,7 @@ xfs_alloc_ag_vextent_near( i = cnt_cur->bc_ptrs[0]; for (j = 1, blen = 0, bdiff = 0; !error && j && (blen < args->maxlen || bdiff > 0); - error = xfs_alloc_increment(cnt_cur, 0, &j)) { + error = xfs_btree_increment(cnt_cur, 0, &j)) { /* * For each entry, decide if it's better than * the previous best entry. @@ -939,7 +939,7 @@ xfs_alloc_ag_vextent_near( * Increment the cursor, so we will point at the entry just right * of the leftward entry if any, or to the leftmost entry. */ - if ((error = xfs_alloc_increment(bno_cur_gt, 0, &i))) + if ((error = xfs_btree_increment(bno_cur_gt, 0, &i))) goto error0; if (!i) { /* @@ -962,7 +962,7 @@ xfs_alloc_ag_vextent_near( args->alignment, args->minlen, <bnoa, <lena)) break; - if ((error = xfs_alloc_decrement(bno_cur_lt, 0, &i))) + if ((error = xfs_btree_decrement(bno_cur_lt, 0, &i))) goto error0; if (!i) { xfs_btree_del_cursor(bno_cur_lt, @@ -978,7 +978,7 @@ xfs_alloc_ag_vextent_near( args->alignment, args->minlen, >bnoa, >lena)) break; - if ((error = xfs_alloc_increment(bno_cur_gt, 0, &i))) + if ((error = xfs_btree_increment(bno_cur_gt, 0, &i))) goto error0; if (!i) { xfs_btree_del_cursor(bno_cur_gt, @@ -1067,7 +1067,7 @@ xfs_alloc_ag_vextent_near( /* * Fell off the right end. */ - if ((error = xfs_alloc_increment( + if ((error = xfs_btree_increment( bno_cur_gt, 0, &i))) goto error0; if (!i) { @@ -1163,7 +1163,7 @@ xfs_alloc_ag_vextent_near( /* * Fell off the left end. */ - if ((error = xfs_alloc_decrement( + if ((error = xfs_btree_decrement( bno_cur_lt, 0, &i))) goto error0; if (!i) { @@ -1322,7 +1322,7 @@ xfs_alloc_ag_vextent_size( bestflen = flen; bestfbno = fbno; for (;;) { - if ((error = xfs_alloc_decrement(cnt_cur, 0, &i))) + if ((error = xfs_btree_decrement(cnt_cur, 0, &i))) goto error0; if (i == 0) break; @@ -1417,7 +1417,7 @@ xfs_alloc_ag_vextent_small( xfs_extlen_t flen; int i; - if ((error = xfs_alloc_decrement(ccur, 0, &i))) + if ((error = xfs_btree_decrement(ccur, 0, &i))) goto error0; if (i) { if ((error = xfs_alloc_get_rec(ccur, &fbno, &flen, &i))) @@ -1550,7 +1550,7 @@ xfs_free_ag_extent( * Look for a neighboring block on the right (higher block numbers) * that is contiguous with this space. */ - if ((error = xfs_alloc_increment(bno_cur, 0, &haveright))) + if ((error = xfs_btree_increment(bno_cur, 0, &haveright))) goto error0; if (haveright) { /* @@ -1589,7 +1589,7 @@ xfs_free_ag_extent( if ((error = xfs_alloc_lookup_eq(cnt_cur, ltbno, ltlen, &i))) goto error0; XFS_WANT_CORRUPTED_GOTO(i == 1, error0); - if ((error = xfs_alloc_delete(cnt_cur, &i))) + if ((error = xfs_btree_delete(cnt_cur, &i))) goto error0; XFS_WANT_CORRUPTED_GOTO(i == 1, error0); /* @@ -1598,19 +1598,19 @@ xfs_free_ag_extent( if ((error = xfs_alloc_lookup_eq(cnt_cur, gtbno, gtlen, &i))) goto error0; XFS_WANT_CORRUPTED_GOTO(i == 1, error0); - if ((error = xfs_alloc_delete(cnt_cur, &i))) + if ((error = xfs_btree_delete(cnt_cur, &i))) goto error0; XFS_WANT_CORRUPTED_GOTO(i == 1, error0); /* * Delete the old by-block entry for the right block. */ - if ((error = xfs_alloc_delete(bno_cur, &i))) + if ((error = xfs_btree_delete(bno_cur, &i))) goto error0; XFS_WANT_CORRUPTED_GOTO(i == 1, error0); /* * Move the by-block cursor back to the left neighbor. */ - if ((error = xfs_alloc_decrement(bno_cur, 0, &i))) + if ((error = xfs_btree_decrement(bno_cur, 0, &i))) goto error0; XFS_WANT_CORRUPTED_GOTO(i == 1, error0); #ifdef DEBUG @@ -1649,14 +1649,14 @@ xfs_free_ag_extent( if ((error = xfs_alloc_lookup_eq(cnt_cur, ltbno, ltlen, &i))) goto error0; XFS_WANT_CORRUPTED_GOTO(i == 1, error0); - if ((error = xfs_alloc_delete(cnt_cur, &i))) + if ((error = xfs_btree_delete(cnt_cur, &i))) goto error0; XFS_WANT_CORRUPTED_GOTO(i == 1, error0); /* * Back up the by-block cursor to the left neighbor, and * update its length. */ - if ((error = xfs_alloc_decrement(bno_cur, 0, &i))) + if ((error = xfs_btree_decrement(bno_cur, 0, &i))) goto error0; XFS_WANT_CORRUPTED_GOTO(i == 1, error0); nbno = ltbno; @@ -1675,7 +1675,7 @@ xfs_free_ag_extent( if ((error = xfs_alloc_lookup_eq(cnt_cur, gtbno, gtlen, &i))) goto error0; XFS_WANT_CORRUPTED_GOTO(i == 1, error0); - if ((error = xfs_alloc_delete(cnt_cur, &i))) + if ((error = xfs_btree_delete(cnt_cur, &i))) goto error0; XFS_WANT_CORRUPTED_GOTO(i == 1, error0); /* @@ -1694,7 +1694,7 @@ xfs_free_ag_extent( else { nbno = bno; nlen = len; - if ((error = xfs_alloc_insert(bno_cur, &i))) + if ((error = xfs_btree_insert(bno_cur, &i))) goto error0; XFS_WANT_CORRUPTED_GOTO(i == 1, error0); } @@ -1706,7 +1706,7 @@ xfs_free_ag_extent( if ((error = xfs_alloc_lookup_eq(cnt_cur, nbno, nlen, &i))) goto error0; XFS_WANT_CORRUPTED_GOTO(i == 0, error0); - if ((error = xfs_alloc_insert(cnt_cur, &i))) + if ((error = xfs_btree_insert(cnt_cur, &i))) goto error0; XFS_WANT_CORRUPTED_GOTO(i == 1, error0); xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR); Index: 2.6.x-xfs-new/fs/xfs/xfs_alloc_btree.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_alloc_btree.c 2007-05-22 19:04:51.000000000 +1000 +++ 2.6.x-xfs-new/fs/xfs/xfs_alloc_btree.c 2007-11-06 19:40:29.702675076 +1100 @@ -39,519 +39,119 @@ #include "xfs_alloc.h" #include "xfs_error.h" + /* - * Prototypes for internal functions. + * Get the block pointer for the given level of the cursor. + * Fill in the buffer pointer, if applicable. */ +STATIC xfs_btree_block_t * +xfs_alloc_get_block( + xfs_btree_cur_t *cur, + int level, + xfs_buf_t **bpp) +{ + ASSERT(level < cur->bc_nlevels); + *bpp = cur->bc_bufs[level]; + return (xfs_btree_block_t *)XFS_BUF_TO_ALLOC_BLOCK(*bpp); +} -STATIC void xfs_alloc_log_block(xfs_trans_t *, xfs_buf_t *, int); -STATIC void xfs_alloc_log_keys(xfs_btree_cur_t *, xfs_buf_t *, int, int); -STATIC void xfs_alloc_log_ptrs(xfs_btree_cur_t *, xfs_buf_t *, int, int); -STATIC void xfs_alloc_log_recs(xfs_btree_cur_t *, xfs_buf_t *, int, int); -STATIC int xfs_alloc_lshift(xfs_btree_cur_t *, int, int *); -STATIC int xfs_alloc_newroot(xfs_btree_cur_t *, int *); -STATIC int xfs_alloc_rshift(xfs_btree_cur_t *, int, int *); -STATIC int xfs_alloc_split(xfs_btree_cur_t *, int, xfs_agblock_t *, - xfs_alloc_key_t *, xfs_btree_cur_t **, int *); -STATIC int xfs_alloc_updkey(xfs_btree_cur_t *, xfs_alloc_key_t *, int); -/* - * Internal functions. - */ +STATIC int +xfs_alloc_get_buf( + xfs_btree_cur_t *cur, + xfs_btree_ptr_t *ptr, + int flags, + xfs_buf_t **bpp) +{ + xfs_buf_t *bp; -/* - * Single level of the xfs_alloc_delete record deletion routine. - * Delete record pointed to by cur/level. - * Remove the record from its block then rebalance the tree. - * Return 0 for error, 1 for done, 2 to go on to the next level. - */ -STATIC int /* error */ -xfs_alloc_delrec( - xfs_btree_cur_t *cur, /* btree cursor */ - int level, /* level removing record from */ - int *stat) /* fail/done/go-on */ + bp = xfs_btree_get_bufs(cur->bc_mp, cur->bc_tp, cur->bc_private.a.agno, + be32_to_cpu(ptr->u.alloc), flags); + *bpp = bp; + return 0; + +} + +STATIC int +xfs_alloc_read_buf( + xfs_btree_cur_t *cur, + xfs_btree_ptr_t *ptr, + int flags, + xfs_buf_t **bpp) { - xfs_agf_t *agf; /* allocation group freelist header */ - xfs_alloc_block_t *block; /* btree block record/key lives in */ - xfs_agblock_t bno; /* btree block number */ - xfs_buf_t *bp; /* buffer for block */ - int error; /* error return value */ - int i; /* loop index */ - xfs_alloc_key_t key; /* kp points here if block is level 0 */ - xfs_agblock_t lbno; /* left block's block number */ - xfs_buf_t *lbp; /* left block's buffer pointer */ - xfs_alloc_block_t *left; /* left btree block */ - xfs_alloc_key_t *lkp=NULL; /* left block key pointer */ - xfs_alloc_ptr_t *lpp=NULL; /* left block address pointer */ - int lrecs=0; /* number of records in left block */ - xfs_alloc_rec_t *lrp; /* left block record pointer */ - xfs_mount_t *mp; /* mount structure */ - int ptr; /* index in btree block for this rec */ - xfs_agblock_t rbno; /* right block's block number */ - xfs_buf_t *rbp; /* right block's buffer pointer */ - xfs_alloc_block_t *right; /* right btree block */ - xfs_alloc_key_t *rkp; /* right block key pointer */ - xfs_alloc_ptr_t *rpp; /* right block address pointer */ - int rrecs=0; /* number of records in right block */ - int numrecs; - xfs_alloc_rec_t *rrp; /* right block record pointer */ - xfs_btree_cur_t *tcur; /* temporary btree cursor */ + return xfs_btree_read_bufs(cur->bc_mp, + cur->bc_tp, cur->bc_private.a.agno, + be32_to_cpu(ptr->u.alloc), flags, + bpp, XFS_ALLOC_BTREE_REF); +} +STATIC xfs_btree_block_t * +xfs_alloc_buf_to_block( + xfs_btree_cur_t *cur, + xfs_buf_t *bp) +{ + /* XFS_BUF_TO_ALLOC_BLOCK(rbp); */ + return XFS_BUF_TO_BLOCK(bp); +} + +STATIC void +xfs_alloc_buf_to_ptr( + xfs_btree_cur_t *cur, + xfs_buf_t *bp, + xfs_btree_ptr_t *ptr) +{ + ptr->u.alloc = cpu_to_be32(XFS_DADDR_TO_AGBNO(cur->bc_mp, XFS_BUF_ADDR(bp))); +} + +STATIC int +xfs_alloc_alloc_block( + xfs_btree_cur_t *cur, + xfs_btree_ptr_t *start, + xfs_btree_ptr_t *new, + int length, + int *stat) +{ + int error; + xfs_agblock_t bno; + + XFS_BTREE_TRACE_CURSOR(cur, ENTER); /* - * Get the index of the entry being deleted, check for nothing there. - */ - ptr = cur->bc_ptrs[level]; - if (ptr == 0) { - *stat = 0; - return 0; - } - /* - * Get the buffer & block containing the record or key/ptr. + * Allocate the new block from the freelist. + * If we can't do it, we're toast. Give up. */ - bp = cur->bc_bufs[level]; - block = XFS_BUF_TO_ALLOC_BLOCK(bp); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, block, level, bp))) + error = xfs_alloc_get_freelist(cur->bc_tp, + cur->bc_private.a.agbp, &bno, 1); + if (error) { + XFS_BTREE_TRACE_CURSOR(cur, ERROR); return error; -#endif - /* - * Fail if we're off the end of the block. - */ - numrecs = be16_to_cpu(block->bb_numrecs); - if (ptr > numrecs) { + } + if (bno == NULLAGBLOCK) { + XFS_BTREE_TRACE_CURSOR(cur, EXIT); *stat = 0; return 0; } - XFS_STATS_INC(xs_abt_delrec); - /* - * It's a nonleaf. Excise the key and ptr being deleted, by - * sliding the entries past them down one. - * Log the changed areas of the block. - */ - if (level > 0) { - lkp = XFS_ALLOC_KEY_ADDR(block, 1, cur); - lpp = XFS_ALLOC_PTR_ADDR(block, 1, cur); -#ifdef DEBUG - for (i = ptr; i < numrecs; i++) { - if ((error = xfs_btree_check_sptr(cur, be32_to_cpu(lpp[i]), level))) - return error; - } -#endif - if (ptr < numrecs) { - memmove(&lkp[ptr - 1], &lkp[ptr], - (numrecs - ptr) * sizeof(*lkp)); - memmove(&lpp[ptr - 1], &lpp[ptr], - (numrecs - ptr) * sizeof(*lpp)); - xfs_alloc_log_ptrs(cur, bp, ptr, numrecs - 1); - xfs_alloc_log_keys(cur, bp, ptr, numrecs - 1); - } - } - /* - * It's a leaf. Excise the record being deleted, by sliding the - * entries past it down one. Log the changed areas of the block. - */ - else { - lrp = XFS_ALLOC_REC_ADDR(block, 1, cur); - if (ptr < numrecs) { - memmove(&lrp[ptr - 1], &lrp[ptr], - (numrecs - ptr) * sizeof(*lrp)); - xfs_alloc_log_recs(cur, bp, ptr, numrecs - 1); - } - /* - * If it's the first record in the block, we'll need a key - * structure to pass up to the next level (updkey). - */ - if (ptr == 1) { - key.ar_startblock = lrp->ar_startblock; - key.ar_blockcount = lrp->ar_blockcount; - lkp = &key; - } - } - /* - * Decrement and log the number of entries in the block. - */ - numrecs--; - block->bb_numrecs = cpu_to_be16(numrecs); - xfs_alloc_log_block(cur->bc_tp, bp, XFS_BB_NUMRECS); - /* - * See if the longest free extent in the allocation group was - * changed by this operation. True if it's the by-size btree, and - * this is the leaf level, and there is no right sibling block, - * and this was the last record. - */ + xfs_trans_agbtree_delta(cur->bc_tp, 1); + XFS_BTREE_TRACE_CURSOR(cur, EXIT); + new->u.alloc = cpu_to_be32(bno); + *stat = 1; + return 0; +} + +STATIC int +xfs_alloc_free_block( + xfs_btree_cur_t *cur, + xfs_buf_t *bp, + int size) +{ + xfs_agf_t *agf; /* allocation group freelist header */ + int error; + xfs_agblock_t bno; + + bno = XFS_DADDR_TO_AGBNO(cur->bc_mp, XFS_BUF_ADDR(bp)); agf = XFS_BUF_TO_AGF(cur->bc_private.a.agbp); - mp = cur->bc_mp; - if (level == 0 && - cur->bc_btnum == XFS_BTNUM_CNT && - be32_to_cpu(block->bb_rightsib) == NULLAGBLOCK && - ptr > numrecs) { - ASSERT(ptr == numrecs + 1); - /* - * There are still records in the block. Grab the size - * from the last one. - */ - if (numrecs) { - rrp = XFS_ALLOC_REC_ADDR(block, numrecs, cur); - agf->agf_longest = rrp->ar_blockcount; - } - /* - * No free extents left. - */ - else - agf->agf_longest = 0; - mp->m_perag[be32_to_cpu(agf->agf_seqno)].pagf_longest = - be32_to_cpu(agf->agf_longest); - xfs_alloc_log_agf(cur->bc_tp, cur->bc_private.a.agbp, - XFS_AGF_LONGEST); - } - /* - * Is this the root level? If so, we're almost done. - */ - if (level == cur->bc_nlevels - 1) { - /* - * If this is the root level, - * and there's only one entry left, - * and it's NOT the leaf level, - * then we can get rid of this level. - */ - if (numrecs == 1 && level > 0) { - /* - * lpp is still set to the first pointer in the block. - * Make it the new root of the btree. - */ - bno = be32_to_cpu(agf->agf_roots[cur->bc_btnum]); - agf->agf_roots[cur->bc_btnum] = *lpp; - be32_add(&agf->agf_levels[cur->bc_btnum], -1); - mp->m_perag[be32_to_cpu(agf->agf_seqno)].pagf_levels[cur->bc_btnum]--; - /* - * Put this buffer/block on the ag's freelist. - */ - error = xfs_alloc_put_freelist(cur->bc_tp, - cur->bc_private.a.agbp, NULL, bno, 1); - if (error) - return error; - /* - * Since blocks move to the free list without the - * coordination used in xfs_bmap_finish, we can't allow - * block to be available for reallocation and - * non-transaction writing (user data) until we know - * that the transaction that moved it to the free list - * is permanently on disk. We track the blocks by - * declaring these blocks as "busy"; the busy list is - * maintained on a per-ag basis and each transaction - * records which entries should be removed when the - * iclog commits to disk. If a busy block is - * allocated, the iclog is pushed up to the LSN - * that freed the block. - */ - xfs_alloc_mark_busy(cur->bc_tp, - be32_to_cpu(agf->agf_seqno), bno, 1); - - xfs_trans_agbtree_delta(cur->bc_tp, -1); - xfs_alloc_log_agf(cur->bc_tp, cur->bc_private.a.agbp, - XFS_AGF_ROOTS | XFS_AGF_LEVELS); - /* - * Update the cursor so there's one fewer level. - */ - xfs_btree_setbuf(cur, level, NULL); - cur->bc_nlevels--; - } else if (level > 0 && - (error = xfs_alloc_decrement(cur, level, &i))) - return error; - *stat = 1; - return 0; - } - /* - * If we deleted the leftmost entry in the block, update the - * key values above us in the tree. - */ - if (ptr == 1 && (error = xfs_alloc_updkey(cur, lkp, level + 1))) - return error; - /* - * If the number of records remaining in the block is at least - * the minimum, we're done. - */ - if (numrecs >= XFS_ALLOC_BLOCK_MINRECS(level, cur)) { - if (level > 0 && (error = xfs_alloc_decrement(cur, level, &i))) - return error; - *stat = 1; - return 0; - } - /* - * Otherwise, we have to move some records around to keep the - * tree balanced. Look at the left and right sibling blocks to - * see if we can re-balance by moving only one record. - */ - rbno = be32_to_cpu(block->bb_rightsib); - lbno = be32_to_cpu(block->bb_leftsib); - bno = NULLAGBLOCK; - ASSERT(rbno != NULLAGBLOCK || lbno != NULLAGBLOCK); - /* - * Duplicate the cursor so our btree manipulations here won't - * disrupt the next level up. - */ - if ((error = xfs_btree_dup_cursor(cur, &tcur))) - return error; - /* - * If there's a right sibling, see if it's ok to shift an entry - * out of it. - */ - if (rbno != NULLAGBLOCK) { - /* - * Move the temp cursor to the last entry in the next block. - * Actually any entry but the first would suffice. - */ - i = xfs_btree_lastrec(tcur, level); - XFS_WANT_CORRUPTED_GOTO(i == 1, error0); - if ((error = xfs_alloc_increment(tcur, level, &i))) - goto error0; - XFS_WANT_CORRUPTED_GOTO(i == 1, error0); - i = xfs_btree_lastrec(tcur, level); - XFS_WANT_CORRUPTED_GOTO(i == 1, error0); - /* - * Grab a pointer to the block. - */ - rbp = tcur->bc_bufs[level]; - right = XFS_BUF_TO_ALLOC_BLOCK(rbp); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, right, level, rbp))) - goto error0; -#endif - /* - * Grab the current block number, for future use. - */ - bno = be32_to_cpu(right->bb_leftsib); - /* - * If right block is full enough so that removing one entry - * won't make it too empty, and left-shifting an entry out - * of right to us works, we're done. - */ - if (be16_to_cpu(right->bb_numrecs) - 1 >= - XFS_ALLOC_BLOCK_MINRECS(level, cur)) { - if ((error = xfs_alloc_lshift(tcur, level, &i))) - goto error0; - if (i) { - ASSERT(be16_to_cpu(block->bb_numrecs) >= - XFS_ALLOC_BLOCK_MINRECS(level, cur)); - xfs_btree_del_cursor(tcur, - XFS_BTREE_NOERROR); - if (level > 0 && - (error = xfs_alloc_decrement(cur, level, - &i))) - return error; - *stat = 1; - return 0; - } - } - /* - * Otherwise, grab the number of records in right for - * future reference, and fix up the temp cursor to point - * to our block again (last record). - */ - rrecs = be16_to_cpu(right->bb_numrecs); - if (lbno != NULLAGBLOCK) { - i = xfs_btree_firstrec(tcur, level); - XFS_WANT_CORRUPTED_GOTO(i == 1, error0); - if ((error = xfs_alloc_decrement(tcur, level, &i))) - goto error0; - XFS_WANT_CORRUPTED_GOTO(i == 1, error0); - } - } - /* - * If there's a left sibling, see if it's ok to shift an entry - * out of it. - */ - if (lbno != NULLAGBLOCK) { - /* - * Move the temp cursor to the first entry in the - * previous block. - */ - i = xfs_btree_firstrec(tcur, level); - XFS_WANT_CORRUPTED_GOTO(i == 1, error0); - if ((error = xfs_alloc_decrement(tcur, level, &i))) - goto error0; - XFS_WANT_CORRUPTED_GOTO(i == 1, error0); - xfs_btree_firstrec(tcur, level); - /* - * Grab a pointer to the block. - */ - lbp = tcur->bc_bufs[level]; - left = XFS_BUF_TO_ALLOC_BLOCK(lbp); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, left, level, lbp))) - goto error0; -#endif - /* - * Grab the current block number, for future use. - */ - bno = be32_to_cpu(left->bb_rightsib); - /* - * If left block is full enough so that removing one entry - * won't make it too empty, and right-shifting an entry out - * of left to us works, we're done. - */ - if (be16_to_cpu(left->bb_numrecs) - 1 >= - XFS_ALLOC_BLOCK_MINRECS(level, cur)) { - if ((error = xfs_alloc_rshift(tcur, level, &i))) - goto error0; - if (i) { - ASSERT(be16_to_cpu(block->bb_numrecs) >= - XFS_ALLOC_BLOCK_MINRECS(level, cur)); - xfs_btree_del_cursor(tcur, - XFS_BTREE_NOERROR); - if (level == 0) - cur->bc_ptrs[0]++; - *stat = 1; - return 0; - } - } - /* - * Otherwise, grab the number of records in right for - * future reference. - */ - lrecs = be16_to_cpu(left->bb_numrecs); - } - /* - * Delete the temp cursor, we're done with it. - */ - xfs_btree_del_cursor(tcur, XFS_BTREE_NOERROR); - /* - * If here, we need to do a join to keep the tree balanced. - */ - ASSERT(bno != NULLAGBLOCK); - /* - * See if we can join with the left neighbor block. - */ - if (lbno != NULLAGBLOCK && - lrecs + numrecs <= XFS_ALLOC_BLOCK_MAXRECS(level, cur)) { - /* - * Set "right" to be the starting block, - * "left" to be the left neighbor. - */ - rbno = bno; - right = block; - rrecs = be16_to_cpu(right->bb_numrecs); - rbp = bp; - if ((error = xfs_btree_read_bufs(mp, cur->bc_tp, - cur->bc_private.a.agno, lbno, 0, &lbp, - XFS_ALLOC_BTREE_REF))) - return error; - left = XFS_BUF_TO_ALLOC_BLOCK(lbp); - lrecs = be16_to_cpu(left->bb_numrecs); - if ((error = xfs_btree_check_sblock(cur, left, level, lbp))) - return error; - } - /* - * If that won't work, see if we can join with the right neighbor block. - */ - else if (rbno != NULLAGBLOCK && - rrecs + numrecs <= XFS_ALLOC_BLOCK_MAXRECS(level, cur)) { - /* - * Set "left" to be the starting block, - * "right" to be the right neighbor. - */ - lbno = bno; - left = block; - lrecs = be16_to_cpu(left->bb_numrecs); - lbp = bp; - if ((error = xfs_btree_read_bufs(mp, cur->bc_tp, - cur->bc_private.a.agno, rbno, 0, &rbp, - XFS_ALLOC_BTREE_REF))) - return error; - right = XFS_BUF_TO_ALLOC_BLOCK(rbp); - rrecs = be16_to_cpu(right->bb_numrecs); - if ((error = xfs_btree_check_sblock(cur, right, level, rbp))) - return error; - } - /* - * Otherwise, we can't fix the imbalance. - * Just return. This is probably a logic error, but it's not fatal. - */ - else { - if (level > 0 && (error = xfs_alloc_decrement(cur, level, &i))) - return error; - *stat = 1; - return 0; - } - /* - * We're now going to join "left" and "right" by moving all the stuff - * in "right" to "left" and deleting "right". - */ - if (level > 0) { - /* - * It's a non-leaf. Move keys and pointers. - */ - lkp = XFS_ALLOC_KEY_ADDR(left, lrecs + 1, cur); - lpp = XFS_ALLOC_PTR_ADDR(left, lrecs + 1, cur); - rkp = XFS_ALLOC_KEY_ADDR(right, 1, cur); - rpp = XFS_ALLOC_PTR_ADDR(right, 1, cur); -#ifdef DEBUG - for (i = 0; i < rrecs; i++) { - if ((error = xfs_btree_check_sptr(cur, be32_to_cpu(rpp[i]), level))) - return error; - } -#endif - memcpy(lkp, rkp, rrecs * sizeof(*lkp)); - memcpy(lpp, rpp, rrecs * sizeof(*lpp)); - xfs_alloc_log_keys(cur, lbp, lrecs + 1, lrecs + rrecs); - xfs_alloc_log_ptrs(cur, lbp, lrecs + 1, lrecs + rrecs); - } else { - /* - * It's a leaf. Move records. - */ - lrp = XFS_ALLOC_REC_ADDR(left, lrecs + 1, cur); - rrp = XFS_ALLOC_REC_ADDR(right, 1, cur); - memcpy(lrp, rrp, rrecs * sizeof(*lrp)); - xfs_alloc_log_recs(cur, lbp, lrecs + 1, lrecs + rrecs); - } - /* - * If we joined with the left neighbor, set the buffer in the - * cursor to the left block, and fix up the index. - */ - if (bp != lbp) { - xfs_btree_setbuf(cur, level, lbp); - cur->bc_ptrs[level] += lrecs; - } - /* - * If we joined with the right neighbor and there's a level above - * us, increment the cursor at that level. - */ - else if (level + 1 < cur->bc_nlevels && - (error = xfs_alloc_increment(cur, level + 1, &i))) - return error; - /* - * Fix up the number of records in the surviving block. - */ - lrecs += rrecs; - left->bb_numrecs = cpu_to_be16(lrecs); - /* - * Fix up the right block pointer in the surviving block, and log it. - */ - left->bb_rightsib = right->bb_rightsib; - xfs_alloc_log_block(cur->bc_tp, lbp, XFS_BB_NUMRECS | XFS_BB_RIGHTSIB); - /* - * If there is a right sibling now, make it point to the - * remaining block. - */ - if (be32_to_cpu(left->bb_rightsib) != NULLAGBLOCK) { - xfs_alloc_block_t *rrblock; - xfs_buf_t *rrbp; - - if ((error = xfs_btree_read_bufs(mp, cur->bc_tp, - cur->bc_private.a.agno, be32_to_cpu(left->bb_rightsib), 0, - &rrbp, XFS_ALLOC_BTREE_REF))) - return error; - rrblock = XFS_BUF_TO_ALLOC_BLOCK(rrbp); - if ((error = xfs_btree_check_sblock(cur, rrblock, level, rrbp))) - return error; - rrblock->bb_leftsib = cpu_to_be32(lbno); - xfs_alloc_log_block(cur->bc_tp, rrbp, XFS_BB_LEFTSIB); - } - /* - * Free the deleting block by putting it on the freelist. - */ - error = xfs_alloc_put_freelist(cur->bc_tp, - cur->bc_private.a.agbp, NULL, rbno, 1); + error = xfs_alloc_put_freelist(cur->bc_tp, cur->bc_private.a.agbp, + NULL, bno, size); if (error) return error; /* @@ -568,278 +168,15 @@ xfs_alloc_delrec( */ xfs_alloc_mark_busy(cur->bc_tp, be32_to_cpu(agf->agf_seqno), bno, 1); xfs_trans_agbtree_delta(cur->bc_tp, -1); - - /* - * Adjust the current level's cursor so that we're left referring - * to the right node, after we're done. - * If this leaves the ptr value 0 our caller will fix it up. - */ - if (level > 0) - cur->bc_ptrs[level]--; - /* - * Return value means the next level up has something to do. - */ - *stat = 2; return 0; - -error0: - xfs_btree_del_cursor(tcur, XFS_BTREE_ERROR); - return error; } /* - * Insert one record/level. Return information to the caller - * allowing the next level up to proceed if necessary. - */ -STATIC int /* error */ -xfs_alloc_insrec( - xfs_btree_cur_t *cur, /* btree cursor */ - int level, /* level to insert record at */ - xfs_agblock_t *bnop, /* i/o: block number inserted */ - xfs_alloc_rec_t *recp, /* i/o: record data inserted */ - xfs_btree_cur_t **curp, /* output: new cursor replacing cur */ - int *stat) /* output: success/failure */ -{ - xfs_agf_t *agf; /* allocation group freelist header */ - xfs_alloc_block_t *block; /* btree block record/key lives in */ - xfs_buf_t *bp; /* buffer for block */ - int error; /* error return value */ - int i; /* loop index */ - xfs_alloc_key_t key; /* key value being inserted */ - xfs_alloc_key_t *kp; /* pointer to btree keys */ - xfs_agblock_t nbno; /* block number of allocated block */ - xfs_btree_cur_t *ncur; /* new cursor to be used at next lvl */ - xfs_alloc_key_t nkey; /* new key value, from split */ - xfs_alloc_rec_t nrec; /* new record value, for caller */ - int numrecs; - int optr; /* old ptr value */ - xfs_alloc_ptr_t *pp; /* pointer to btree addresses */ - int ptr; /* index in btree block for this rec */ - xfs_alloc_rec_t *rp; /* pointer to btree records */ - - ASSERT(be32_to_cpu(recp->ar_blockcount) > 0); - - /* - * GCC doesn't understand the (arguably complex) control flow in - * this function and complains about uninitialized structure fields - * without this. - */ - memset(&nrec, 0, sizeof(nrec)); - - /* - * If we made it to the root level, allocate a new root block - * and we're done. - */ - if (level >= cur->bc_nlevels) { - XFS_STATS_INC(xs_abt_insrec); - if ((error = xfs_alloc_newroot(cur, &i))) - return error; - *bnop = NULLAGBLOCK; - *stat = i; - return 0; - } - /* - * Make a key out of the record data to be inserted, and save it. - */ - key.ar_startblock = recp->ar_startblock; - key.ar_blockcount = recp->ar_blockcount; - optr = ptr = cur->bc_ptrs[level]; - /* - * If we're off the left edge, return failure. - */ - if (ptr == 0) { - *stat = 0; - return 0; - } - XFS_STATS_INC(xs_abt_insrec); - /* - * Get pointers to the btree buffer and block. - */ - bp = cur->bc_bufs[level]; - block = XFS_BUF_TO_ALLOC_BLOCK(bp); - numrecs = be16_to_cpu(block->bb_numrecs); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, block, level, bp))) - return error; - /* - * Check that the new entry is being inserted in the right place. - */ - if (ptr <= numrecs) { - if (level == 0) { - rp = XFS_ALLOC_REC_ADDR(block, ptr, cur); - xfs_btree_check_rec(cur->bc_btnum, recp, rp); - } else { - kp = XFS_ALLOC_KEY_ADDR(block, ptr, cur); - xfs_btree_check_key(cur->bc_btnum, &key, kp); - } - } -#endif - nbno = NULLAGBLOCK; - ncur = NULL; - /* - * If the block is full, we can't insert the new entry until we - * make the block un-full. - */ - if (numrecs == XFS_ALLOC_BLOCK_MAXRECS(level, cur)) { - /* - * First, try shifting an entry to the right neighbor. - */ - if ((error = xfs_alloc_rshift(cur, level, &i))) - return error; - if (i) { - /* nothing */ - } - /* - * Next, try shifting an entry to the left neighbor. - */ - else { - if ((error = xfs_alloc_lshift(cur, level, &i))) - return error; - if (i) - optr = ptr = cur->bc_ptrs[level]; - else { - /* - * Next, try splitting the current block in - * half. If this works we have to re-set our - * variables because we could be in a - * different block now. - */ - if ((error = xfs_alloc_split(cur, level, &nbno, - &nkey, &ncur, &i))) - return error; - if (i) { - bp = cur->bc_bufs[level]; - block = XFS_BUF_TO_ALLOC_BLOCK(bp); -#ifdef DEBUG - if ((error = - xfs_btree_check_sblock(cur, - block, level, bp))) - return error; -#endif - ptr = cur->bc_ptrs[level]; - nrec.ar_startblock = nkey.ar_startblock; - nrec.ar_blockcount = nkey.ar_blockcount; - } - /* - * Otherwise the insert fails. - */ - else { - *stat = 0; - return 0; - } - } - } - } - /* - * At this point we know there's room for our new entry in the block - * we're pointing at. - */ - numrecs = be16_to_cpu(block->bb_numrecs); - if (level > 0) { - /* - * It's a non-leaf entry. Make a hole for the new data - * in the key and ptr regions of the block. - */ - kp = XFS_ALLOC_KEY_ADDR(block, 1, cur); - pp = XFS_ALLOC_PTR_ADDR(block, 1, cur); -#ifdef DEBUG - for (i = numrecs; i >= ptr; i--) { - if ((error = xfs_btree_check_sptr(cur, be32_to_cpu(pp[i - 1]), level))) - return error; - } -#endif - memmove(&kp[ptr], &kp[ptr - 1], - (numrecs - ptr + 1) * sizeof(*kp)); - memmove(&pp[ptr], &pp[ptr - 1], - (numrecs - ptr + 1) * sizeof(*pp)); -#ifdef DEBUG - if ((error = xfs_btree_check_sptr(cur, *bnop, level))) - return error; -#endif - /* - * Now stuff the new data in, bump numrecs and log the new data. - */ - kp[ptr - 1] = key; - pp[ptr - 1] = cpu_to_be32(*bnop); - numrecs++; - block->bb_numrecs = cpu_to_be16(numrecs); - xfs_alloc_log_keys(cur, bp, ptr, numrecs); - xfs_alloc_log_ptrs(cur, bp, ptr, numrecs); -#ifdef DEBUG - if (ptr < numrecs) - xfs_btree_check_key(cur->bc_btnum, kp + ptr - 1, - kp + ptr); -#endif - } else { - /* - * It's a leaf entry. Make a hole for the new record. - */ - rp = XFS_ALLOC_REC_ADDR(block, 1, cur); - memmove(&rp[ptr], &rp[ptr - 1], - (numrecs - ptr + 1) * sizeof(*rp)); - /* - * Now stuff the new record in, bump numrecs - * and log the new data. - */ - rp[ptr - 1] = *recp; - numrecs++; - block->bb_numrecs = cpu_to_be16(numrecs); - xfs_alloc_log_recs(cur, bp, ptr, numrecs); -#ifdef DEBUG - if (ptr < numrecs) - xfs_btree_check_rec(cur->bc_btnum, rp + ptr - 1, - rp + ptr); -#endif - } - /* - * Log the new number of records in the btree header. - */ - xfs_alloc_log_block(cur->bc_tp, bp, XFS_BB_NUMRECS); - /* - * If we inserted at the start of a block, update the parents' keys. - */ - if (optr == 1 && (error = xfs_alloc_updkey(cur, &key, level + 1))) - return error; - /* - * Look to see if the longest extent in the allocation group - * needs to be updated. - */ - - agf = XFS_BUF_TO_AGF(cur->bc_private.a.agbp); - if (level == 0 && - cur->bc_btnum == XFS_BTNUM_CNT && - be32_to_cpu(block->bb_rightsib) == NULLAGBLOCK && - be32_to_cpu(recp->ar_blockcount) > be32_to_cpu(agf->agf_longest)) { - /* - * If this is a leaf in the by-size btree and there - * is no right sibling block and this block is bigger - * than the previous longest block, update it. - */ - agf->agf_longest = recp->ar_blockcount; - cur->bc_mp->m_perag[be32_to_cpu(agf->agf_seqno)].pagf_longest - = be32_to_cpu(recp->ar_blockcount); - xfs_alloc_log_agf(cur->bc_tp, cur->bc_private.a.agbp, - XFS_AGF_LONGEST); - } - /* - * Return the new block number, if any. - * If there is one, give back a record value and a cursor too. - */ - *bnop = nbno; - if (nbno != NULLAGBLOCK) { - *recp = nrec; - *curp = ncur; - } - *stat = 1; - return 0; -} - -/* - * Log header fields from a btree block. + * Log fields from the btree block header. */ STATIC void xfs_alloc_log_block( - xfs_trans_t *tp, /* transaction pointer */ + xfs_btree_cur_t *cur, /* btree cursor */ xfs_buf_t *bp, /* buffer containing btree block */ int fields) /* mask of fields: XFS_BB_... */ { @@ -854,1243 +191,629 @@ xfs_alloc_log_block( sizeof(xfs_alloc_block_t) }; + XFS_BTREE_TRACE_CURSOR(cur, ENTRY); + XFS_BTREE_TRACE_ARGBI(cur, bp, fields); xfs_btree_offsets(fields, offsets, XFS_BB_NUM_BITS, &first, &last); - xfs_trans_log_buf(tp, bp, first, last); + xfs_trans_log_buf(cur->bc_tp, bp, first, last); + XFS_BTREE_TRACE_CURSOR(cur, EXIT); } -/* - * Log keys from a btree block (nonleaf). - */ -STATIC void -xfs_alloc_log_keys( - xfs_btree_cur_t *cur, /* btree cursor */ - xfs_buf_t *bp, /* buffer containing btree block */ - int kfirst, /* index of first key to log */ - int klast) /* index of last key to log */ +static const struct xfs_btree_block_ops xfs_alloc_blkops = { + .get_buf = xfs_alloc_get_buf, + .read_buf = xfs_alloc_read_buf, + .get_block = xfs_alloc_get_block, + .buf_to_block = xfs_alloc_buf_to_block, + .buf_to_ptr = xfs_alloc_buf_to_ptr, + .log_block = xfs_alloc_log_block, + .check_block = xfs_btree_check_sblock, + + .alloc_block = xfs_alloc_alloc_block, + .free_block = xfs_alloc_free_block, + + .get_sibling = xfs_btree_get_ssibling, + .set_sibling = xfs_btree_set_ssibling, + .init_sibling = xfs_btree_init_sibling, +}; + +STATIC int +xfs_alloc_get_minrecs( + xfs_btree_cur_t *cur, + int lev) { - xfs_alloc_block_t *block; /* btree block to log from */ - int first; /* first byte offset logged */ - xfs_alloc_key_t *kp; /* key pointer in btree block */ - int last; /* last byte offset logged */ + return cur->bc_mp->m_alloc_mnr[lev != 0]; +} - block = XFS_BUF_TO_ALLOC_BLOCK(bp); - kp = XFS_ALLOC_KEY_ADDR(block, 1, cur); - first = (int)((xfs_caddr_t)&kp[kfirst - 1] - (xfs_caddr_t)block); - last = (int)(((xfs_caddr_t)&kp[klast] - 1) - (xfs_caddr_t)block); - xfs_trans_log_buf(cur->bc_tp, bp, first, last); +STATIC int +xfs_alloc_get_maxrecs( + xfs_btree_cur_t *cur, + int lev) +{ + return cur->bc_mp->m_alloc_mxr[lev != 0]; } -/* - * Log block pointer fields from a btree block (nonleaf). - */ -STATIC void -xfs_alloc_log_ptrs( - xfs_btree_cur_t *cur, /* btree cursor */ - xfs_buf_t *bp, /* buffer containing btree block */ - int pfirst, /* index of first pointer to log */ - int plast) /* index of last pointer to log */ +STATIC int +xfs_btree_get_numrecs( + xfs_btree_cur_t *cur, + xfs_btree_block_t *block) { - xfs_alloc_block_t *block; /* btree block to log from */ - int first; /* first byte offset logged */ - int last; /* last byte offset logged */ - xfs_alloc_ptr_t *pp; /* block-pointer pointer in btree blk */ + BUG_ON(be16_to_cpu(block->bb_h.bb_numrecs) < 0); + BUG_ON(be16_to_cpu(block->bb_h.bb_numrecs) > 1000); + return be16_to_cpu(block->bb_h.bb_numrecs); +} - block = XFS_BUF_TO_ALLOC_BLOCK(bp); - pp = XFS_ALLOC_PTR_ADDR(block, 1, cur); - first = (int)((xfs_caddr_t)&pp[pfirst - 1] - (xfs_caddr_t)block); - last = (int)(((xfs_caddr_t)&pp[plast] - 1) - (xfs_caddr_t)block); - xfs_trans_log_buf(cur->bc_tp, bp, first, last); +STATIC void +xfs_btree_set_numrecs( + xfs_btree_cur_t *cur, + xfs_btree_block_t *block, + int numrecs) +{ + BUG_ON(numrecs < 0); + BUG_ON(numrecs > 1000); + block->bb_h.bb_numrecs = cpu_to_be16(numrecs); } -/* - * Log records from a btree block (leaf). - */ STATIC void -xfs_alloc_log_recs( - xfs_btree_cur_t *cur, /* btree cursor */ - xfs_buf_t *bp, /* buffer containing btree block */ - int rfirst, /* index of first record to log */ - int rlast) /* index of last record to log */ +xfs_alloc_init_key_from_rec( + xfs_btree_cur_t *cur, + xfs_btree_key_t *key, + xfs_btree_rec_t *rec) { - xfs_alloc_block_t *block; /* btree block to log from */ - int first; /* first byte offset logged */ - int last; /* last byte offset logged */ - xfs_alloc_rec_t *rp; /* record pointer for btree block */ - - - block = XFS_BUF_TO_ALLOC_BLOCK(bp); - rp = XFS_ALLOC_REC_ADDR(block, 1, cur); -#ifdef DEBUG - { - xfs_agf_t *agf; - xfs_alloc_rec_t *p; - - agf = XFS_BUF_TO_AGF(cur->bc_private.a.agbp); - for (p = &rp[rfirst - 1]; p <= &rp[rlast - 1]; p++) - ASSERT(be32_to_cpu(p->ar_startblock) + - be32_to_cpu(p->ar_blockcount) <= - be32_to_cpu(agf->agf_length)); - } -#endif - first = (int)((xfs_caddr_t)&rp[rfirst - 1] - (xfs_caddr_t)block); - last = (int)(((xfs_caddr_t)&rp[rlast] - 1) - (xfs_caddr_t)block); - xfs_trans_log_buf(cur->bc_tp, bp, first, last); + key->u.alloc.ar_startblock = rec->u.alloc.ar_startblock; + key->u.alloc.ar_blockcount = rec->u.alloc.ar_blockcount; + BUG_ON(key->u.alloc.ar_startblock == 0); } /* - * Lookup the record. The cursor is made to point to it, based on dir. - * Return 0 if can't find any such record, 1 for success. + * intial value of ptr for lookup */ -STATIC int /* error */ -xfs_alloc_lookup( - xfs_btree_cur_t *cur, /* btree cursor */ - xfs_lookup_t dir, /* <=, ==, or >= */ - int *stat) /* success/failure */ +STATIC void +xfs_alloc_init_ptr_from_cur( + xfs_btree_cur_t *cur, + xfs_btree_ptr_t *ptr) { - xfs_agblock_t agbno; /* a.g. relative btree block number */ - xfs_agnumber_t agno; /* allocation group number */ - xfs_alloc_block_t *block=NULL; /* current btree block */ - int diff; /* difference for the current key */ - int error; /* error return value */ - int keyno=0; /* current key number */ - int level; /* level in the btree */ - xfs_mount_t *mp; /* file system mount point */ + xfs_agf_t *agf; /* a.g. freespace header */ - XFS_STATS_INC(xs_abt_lookup); - /* - * Get the allocation group header, and the root block number. - */ - mp = cur->bc_mp; + agf = XFS_BUF_TO_AGF(cur->bc_private.a.agbp); + ASSERT(cur->bc_private.a.agno == be32_to_cpu(agf->agf_seqno)); + ptr->u.alloc = agf->agf_roots[cur->bc_btnum]; + BUG_ON(ptr->u.alloc == 0); +} - { - xfs_agf_t *agf; /* a.g. freespace header */ +STATIC void +xfs_alloc_init_rec_from_key( + xfs_btree_cur_t *cur, + xfs_btree_key_t *key, + xfs_btree_rec_t *rec) +{ + BUG_ON(key->u.alloc.ar_startblock == 0); + rec->u.alloc.ar_startblock = key->u.alloc.ar_startblock; + rec->u.alloc.ar_blockcount = key->u.alloc.ar_blockcount; +} - agf = XFS_BUF_TO_AGF(cur->bc_private.a.agbp); - agno = be32_to_cpu(agf->agf_seqno); - agbno = be32_to_cpu(agf->agf_roots[cur->bc_btnum]); - } - /* - * Iterate over each level in the btree, starting at the root. - * For each level above the leaves, find the key we need, based - * on the lookup record, then follow the corresponding block - * pointer down to the next level. - */ - for (level = cur->bc_nlevels - 1, diff = 1; level >= 0; level--) { - xfs_buf_t *bp; /* buffer pointer for btree block */ - xfs_daddr_t d; /* disk address of btree block */ - - /* - * Get the disk address we're looking for. - */ - d = XFS_AGB_TO_DADDR(mp, agno, agbno); - /* - * If the old buffer at this level is for a different block, - * throw it away, otherwise just use it. - */ - bp = cur->bc_bufs[level]; - if (bp && XFS_BUF_ADDR(bp) != d) - bp = NULL; - if (!bp) { - /* - * Need to get a new buffer. Read it, then - * set it in the cursor, releasing the old one. - */ - if ((error = xfs_btree_read_bufs(mp, cur->bc_tp, agno, - agbno, 0, &bp, XFS_ALLOC_BTREE_REF))) - return error; - xfs_btree_setbuf(cur, level, bp); - /* - * Point to the btree block, now that we have the buffer - */ - block = XFS_BUF_TO_ALLOC_BLOCK(bp); - if ((error = xfs_btree_check_sblock(cur, block, level, - bp))) - return error; - } else - block = XFS_BUF_TO_ALLOC_BLOCK(bp); - /* - * If we already had a key match at a higher level, we know - * we need to use the first entry in this block. - */ - if (diff == 0) - keyno = 1; - /* - * Otherwise we need to search this block. Do a binary search. - */ - else { - int high; /* high entry number */ - xfs_alloc_key_t *kkbase=NULL;/* base of keys in block */ - xfs_alloc_rec_t *krbase=NULL;/* base of records in block */ - int low; /* low entry number */ - - /* - * Get a pointer to keys or records. - */ - if (level > 0) - kkbase = XFS_ALLOC_KEY_ADDR(block, 1, cur); - else - krbase = XFS_ALLOC_REC_ADDR(block, 1, cur); - /* - * Set low and high entry numbers, 1-based. - */ - low = 1; - if (!(high = be16_to_cpu(block->bb_numrecs))) { - /* - * If the block is empty, the tree must - * be an empty leaf. - */ - ASSERT(level == 0 && cur->bc_nlevels == 1); - cur->bc_ptrs[0] = dir != XFS_LOOKUP_LE; - *stat = 0; - return 0; - } - /* - * Binary search the block. - */ - while (low <= high) { - xfs_extlen_t blockcount; /* key value */ - xfs_agblock_t startblock; /* key value */ - - XFS_STATS_INC(xs_abt_compare); - /* - * keyno is average of low and high. - */ - keyno = (low + high) >> 1; - /* - * Get startblock & blockcount. - */ - if (level > 0) { - xfs_alloc_key_t *kkp; - - kkp = kkbase + keyno - 1; - startblock = be32_to_cpu(kkp->ar_startblock); - blockcount = be32_to_cpu(kkp->ar_blockcount); - } else { - xfs_alloc_rec_t *krp; - - krp = krbase + keyno - 1; - startblock = be32_to_cpu(krp->ar_startblock); - blockcount = be32_to_cpu(krp->ar_blockcount); - } - /* - * Compute difference to get next direction. - */ - if (cur->bc_btnum == XFS_BTNUM_BNO) - diff = (int)startblock - - (int)cur->bc_rec.a.ar_startblock; - else if (!(diff = (int)blockcount - - (int)cur->bc_rec.a.ar_blockcount)) - diff = (int)startblock - - (int)cur->bc_rec.a.ar_startblock; - /* - * Less than, move right. - */ - if (diff < 0) - low = keyno + 1; - /* - * Greater than, move left. - */ - else if (diff > 0) - high = keyno - 1; - /* - * Equal, we're done. - */ - else - break; - } - } - /* - * If there are more levels, set up for the next level - * by getting the block number and filling in the cursor. - */ - if (level > 0) { - /* - * If we moved left, need the previous key number, - * unless there isn't one. - */ - if (diff > 0 && --keyno < 1) - keyno = 1; - agbno = be32_to_cpu(*XFS_ALLOC_PTR_ADDR(block, keyno, cur)); -#ifdef DEBUG - if ((error = xfs_btree_check_sptr(cur, agbno, level))) - return error; -#endif - cur->bc_ptrs[level] = keyno; - } - } - /* - * Done with the search. - * See if we need to adjust the results. - */ - if (dir != XFS_LOOKUP_LE && diff < 0) { - keyno++; - /* - * If ge search and we went off the end of the block, but it's - * not the last block, we're in the wrong block. - */ - if (dir == XFS_LOOKUP_GE && - keyno > be16_to_cpu(block->bb_numrecs) && - be32_to_cpu(block->bb_rightsib) != NULLAGBLOCK) { - int i; - - cur->bc_ptrs[0] = keyno; - if ((error = xfs_alloc_increment(cur, 0, &i))) - return error; - XFS_WANT_CORRUPTED_RETURN(i == 1); - *stat = 1; - return 0; - } - } - else if (dir == XFS_LOOKUP_LE && diff > 0) - keyno--; - cur->bc_ptrs[0] = keyno; - /* - * Return if we succeeded or not. - */ - if (keyno == 0 || keyno > be16_to_cpu(block->bb_numrecs)) - *stat = 0; - else - *stat = ((dir != XFS_LOOKUP_EQ) || (diff == 0)); - return 0; +STATIC void +xfs_alloc_init_rec_from_cur( + xfs_btree_cur_t *cur, + xfs_btree_rec_t *rec) +{ + BUG_ON(cur->bc_rec.a.ar_startblock == 0); + rec->u.alloc.ar_startblock = cpu_to_be32(cur->bc_rec.a.ar_startblock); + rec->u.alloc.ar_blockcount = cpu_to_be32(cur->bc_rec.a.ar_blockcount); } -/* - * Move 1 record left from cur/level if possible. - * Update cur to reflect the new path. - */ -STATIC int /* error */ -xfs_alloc_lshift( - xfs_btree_cur_t *cur, /* btree cursor */ - int level, /* level to shift record on */ - int *stat) /* success/failure */ +STATIC xfs_btree_key_t * +xfs_alloc_key_addr( + xfs_btree_cur_t *cur, + int index, + xfs_btree_block_t *block) { - int error; /* error return value */ -#ifdef DEBUG - int i; /* loop index */ -#endif - xfs_alloc_key_t key; /* key value for leaf level upward */ - xfs_buf_t *lbp; /* buffer for left neighbor block */ - xfs_alloc_block_t *left; /* left neighbor btree block */ - int nrec; /* new number of left block entries */ - xfs_buf_t *rbp; /* buffer for right (current) block */ - xfs_alloc_block_t *right; /* right (current) btree block */ - xfs_alloc_key_t *rkp=NULL; /* key pointer for right block */ - xfs_alloc_ptr_t *rpp=NULL; /* address pointer for right block */ - xfs_alloc_rec_t *rrp=NULL; /* record pointer for right block */ + return (xfs_btree_key_t *)XFS_ALLOC_KEY_ADDR(&block->bb_h, index, cur); +} - /* - * Set up variables for this block as "right". - */ - rbp = cur->bc_bufs[level]; - right = XFS_BUF_TO_ALLOC_BLOCK(rbp); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, right, level, rbp))) - return error; -#endif - /* - * If we've got no left sibling then we can't shift an entry left. - */ - if (be32_to_cpu(right->bb_leftsib) == NULLAGBLOCK) { - *stat = 0; - return 0; - } - /* - * If the cursor entry is the one that would be moved, don't - * do it... it's too complicated. - */ - if (cur->bc_ptrs[level] <= 1) { - *stat = 0; - return 0; - } - /* - * Set up the left neighbor as "left". - */ - if ((error = xfs_btree_read_bufs(cur->bc_mp, cur->bc_tp, - cur->bc_private.a.agno, be32_to_cpu(right->bb_leftsib), - 0, &lbp, XFS_ALLOC_BTREE_REF))) - return error; - left = XFS_BUF_TO_ALLOC_BLOCK(lbp); - if ((error = xfs_btree_check_sblock(cur, left, level, lbp))) - return error; - /* - * If it's full, it can't take another entry. - */ - if (be16_to_cpu(left->bb_numrecs) == XFS_ALLOC_BLOCK_MAXRECS(level, cur)) { - *stat = 0; - return 0; - } - nrec = be16_to_cpu(left->bb_numrecs) + 1; - /* - * If non-leaf, copy a key and a ptr to the left block. - */ - if (level > 0) { - xfs_alloc_key_t *lkp; /* key pointer for left block */ - xfs_alloc_ptr_t *lpp; /* address pointer for left block */ - - lkp = XFS_ALLOC_KEY_ADDR(left, nrec, cur); - rkp = XFS_ALLOC_KEY_ADDR(right, 1, cur); - *lkp = *rkp; - xfs_alloc_log_keys(cur, lbp, nrec, nrec); - lpp = XFS_ALLOC_PTR_ADDR(left, nrec, cur); - rpp = XFS_ALLOC_PTR_ADDR(right, 1, cur); -#ifdef DEBUG - if ((error = xfs_btree_check_sptr(cur, be32_to_cpu(*rpp), level))) - return error; -#endif - *lpp = *rpp; - xfs_alloc_log_ptrs(cur, lbp, nrec, nrec); - xfs_btree_check_key(cur->bc_btnum, lkp - 1, lkp); - } - /* - * If leaf, copy a record to the left block. - */ - else { - xfs_alloc_rec_t *lrp; /* record pointer for left block */ +STATIC xfs_btree_ptr_t * +xfs_alloc_ptr_addr( + xfs_btree_cur_t *cur, + int index, + xfs_btree_block_t *block) +{ + return (xfs_btree_ptr_t *)XFS_ALLOC_PTR_ADDR(&block->bb_h, index, cur); +} - lrp = XFS_ALLOC_REC_ADDR(left, nrec, cur); - rrp = XFS_ALLOC_REC_ADDR(right, 1, cur); - *lrp = *rrp; - xfs_alloc_log_recs(cur, lbp, nrec, nrec); - xfs_btree_check_rec(cur->bc_btnum, lrp - 1, lrp); - } - /* - * Bump and log left's numrecs, decrement and log right's numrecs. - */ - be16_add(&left->bb_numrecs, 1); - xfs_alloc_log_block(cur->bc_tp, lbp, XFS_BB_NUMRECS); - be16_add(&right->bb_numrecs, -1); - xfs_alloc_log_block(cur->bc_tp, rbp, XFS_BB_NUMRECS); - /* - * Slide the contents of right down one entry. - */ - if (level > 0) { -#ifdef DEBUG - for (i = 0; i < be16_to_cpu(right->bb_numrecs); i++) { - if ((error = xfs_btree_check_sptr(cur, be32_to_cpu(rpp[i + 1]), - level))) - return error; - } -#endif - memmove(rkp, rkp + 1, be16_to_cpu(right->bb_numrecs) * sizeof(*rkp)); - memmove(rpp, rpp + 1, be16_to_cpu(right->bb_numrecs) * sizeof(*rpp)); - xfs_alloc_log_keys(cur, rbp, 1, be16_to_cpu(right->bb_numrecs)); - xfs_alloc_log_ptrs(cur, rbp, 1, be16_to_cpu(right->bb_numrecs)); - } else { - memmove(rrp, rrp + 1, be16_to_cpu(right->bb_numrecs) * sizeof(*rrp)); - xfs_alloc_log_recs(cur, rbp, 1, be16_to_cpu(right->bb_numrecs)); - key.ar_startblock = rrp->ar_startblock; - key.ar_blockcount = rrp->ar_blockcount; - rkp = &key; - } - /* - * Update the parent key values of right. - */ - if ((error = xfs_alloc_updkey(cur, rkp, level + 1))) - return error; - /* - * Slide the cursor value left one. - */ - cur->bc_ptrs[level]--; - *stat = 1; - return 0; +STATIC xfs_btree_rec_t * +xfs_alloc_rec_addr( + xfs_btree_cur_t *cur, + int index, + xfs_btree_block_t *block) +{ + return (xfs_btree_rec_t *)XFS_ALLOC_REC_ADDR(&block->bb_h, index, cur); } -/* - * Allocate a new root block, fill it in. - */ -STATIC int /* error */ -xfs_alloc_newroot( - xfs_btree_cur_t *cur, /* btree cursor */ - int *stat) /* success/failure */ +STATIC int64_t +xfs_alloc_key_diff( + xfs_btree_cur_t *cur, + xfs_btree_key_t *key) { - int error; /* error return value */ - xfs_agblock_t lbno; /* left block number */ - xfs_buf_t *lbp; /* left btree buffer */ - xfs_alloc_block_t *left; /* left btree block */ - xfs_mount_t *mp; /* mount structure */ - xfs_agblock_t nbno; /* new block number */ - xfs_buf_t *nbp; /* new (root) buffer */ - xfs_alloc_block_t *new; /* new (root) btree block */ - int nptr; /* new value for key index, 1 or 2 */ - xfs_agblock_t rbno; /* right block number */ - xfs_buf_t *rbp; /* right btree buffer */ - xfs_alloc_block_t *right; /* right btree block */ + xfs_alloc_rec_incore_t *rec = &cur->bc_rec.a; + xfs_alloc_key_t *kp = &key->u.alloc; + int64_t diff; - mp = cur->bc_mp; + if (cur->bc_btnum == XFS_BTNUM_BNO) + return (int64_t)(be32_to_cpu(kp->ar_startblock)) - + rec->ar_startblock; - ASSERT(cur->bc_nlevels < XFS_AG_MAXLEVELS(mp)); - /* - * Get a buffer from the freelist blocks, for the new root. - */ - error = xfs_alloc_get_freelist(cur->bc_tp, - cur->bc_private.a.agbp, &nbno, 1); - if (error) - return error; - /* - * None available, we fail. - */ - if (nbno == NULLAGBLOCK) { - *stat = 0; - return 0; - } - xfs_trans_agbtree_delta(cur->bc_tp, 1); - nbp = xfs_btree_get_bufs(mp, cur->bc_tp, cur->bc_private.a.agno, nbno, - 0); - new = XFS_BUF_TO_ALLOC_BLOCK(nbp); - /* - * Set the root data in the a.g. freespace structure. - */ - { - xfs_agf_t *agf; /* a.g. freespace header */ - xfs_agnumber_t seqno; + diff = (int64_t)(be32_to_cpu(kp->ar_blockcount)) - rec->ar_blockcount; + if (!diff) + diff = (int64_t)(be32_to_cpu(kp->ar_startblock)) - + rec->ar_startblock; + return diff; +} - agf = XFS_BUF_TO_AGF(cur->bc_private.a.agbp); - agf->agf_roots[cur->bc_btnum] = cpu_to_be32(nbno); - be32_add(&agf->agf_levels[cur->bc_btnum], 1); - seqno = be32_to_cpu(agf->agf_seqno); - mp->m_perag[seqno].pagf_levels[cur->bc_btnum]++; - xfs_alloc_log_agf(cur->bc_tp, cur->bc_private.a.agbp, - XFS_AGF_ROOTS | XFS_AGF_LEVELS); - } - /* - * At the previous root level there are now two blocks: the old - * root, and the new block generated when it was split. - * We don't know which one the cursor is pointing at, so we - * set up variables "left" and "right" for each case. - */ - lbp = cur->bc_bufs[cur->bc_nlevels - 1]; - left = XFS_BUF_TO_ALLOC_BLOCK(lbp); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, left, cur->bc_nlevels - 1, lbp))) - return error; -#endif - if (be32_to_cpu(left->bb_rightsib) != NULLAGBLOCK) { - /* - * Our block is left, pick up the right block. - */ - lbno = XFS_DADDR_TO_AGBNO(mp, XFS_BUF_ADDR(lbp)); - rbno = be32_to_cpu(left->bb_rightsib); - if ((error = xfs_btree_read_bufs(mp, cur->bc_tp, - cur->bc_private.a.agno, rbno, 0, &rbp, - XFS_ALLOC_BTREE_REF))) - return error; - right = XFS_BUF_TO_ALLOC_BLOCK(rbp); - if ((error = xfs_btree_check_sblock(cur, right, - cur->bc_nlevels - 1, rbp))) - return error; - nptr = 1; - } else { - /* - * Our block is right, pick up the left block. - */ - rbp = lbp; - right = left; - rbno = XFS_DADDR_TO_AGBNO(mp, XFS_BUF_ADDR(rbp)); - lbno = be32_to_cpu(right->bb_leftsib); - if ((error = xfs_btree_read_bufs(mp, cur->bc_tp, - cur->bc_private.a.agno, lbno, 0, &lbp, - XFS_ALLOC_BTREE_REF))) - return error; - left = XFS_BUF_TO_ALLOC_BLOCK(lbp); - if ((error = xfs_btree_check_sblock(cur, left, - cur->bc_nlevels - 1, lbp))) - return error; - nptr = 2; - } - /* - * Fill in the new block's btree header and log it. - */ - new->bb_magic = cpu_to_be32(xfs_magics[cur->bc_btnum]); - new->bb_level = cpu_to_be16(cur->bc_nlevels); - new->bb_numrecs = cpu_to_be16(2); - new->bb_leftsib = cpu_to_be32(NULLAGBLOCK); - new->bb_rightsib = cpu_to_be32(NULLAGBLOCK); - xfs_alloc_log_block(cur->bc_tp, nbp, XFS_BB_ALL_BITS); - ASSERT(lbno != NULLAGBLOCK && rbno != NULLAGBLOCK); - /* - * Fill in the key data in the new root. - */ - { - xfs_alloc_key_t *kp; /* btree key pointer */ +STATIC xfs_daddr_t +xfs_alloc_ptr_to_daddr( + xfs_btree_cur_t *cur, + xfs_btree_ptr_t *ptr) +{ + return XFS_AGB_TO_DADDR(cur->bc_mp, cur->bc_private.a.agno, + be32_to_cpu(ptr->u.alloc)); +} - kp = XFS_ALLOC_KEY_ADDR(new, 1, cur); - if (be16_to_cpu(left->bb_level) > 0) { - kp[0] = *XFS_ALLOC_KEY_ADDR(left, 1, cur); - kp[1] = *XFS_ALLOC_KEY_ADDR(right, 1, cur); - } else { - xfs_alloc_rec_t *rp; /* btree record pointer */ - - rp = XFS_ALLOC_REC_ADDR(left, 1, cur); - kp[0].ar_startblock = rp->ar_startblock; - kp[0].ar_blockcount = rp->ar_blockcount; - rp = XFS_ALLOC_REC_ADDR(right, 1, cur); - kp[1].ar_startblock = rp->ar_startblock; - kp[1].ar_blockcount = rp->ar_blockcount; - } +STATIC void +xfs_alloc_move_keys( + xfs_btree_cur_t *cur, + xfs_btree_key_t *src_key, + xfs_btree_key_t *dst_key, + int from, + int to, + int numkeys) +{ + BUG_ON(from < 0 || to < 0); + BUG_ON(from > 1000 || to > 1000); + BUG_ON(numkeys < 0); + + /* + * we can get a request to move zero records if the + * block is already empty. e.g. xfs_alloc_fix_freelist + * will delete the current entry and then reinsert a + * modified entry. If there is only a single entry in + * the block, the will result in an empty block. + */ + if (numkeys == 0) + return; + if (dst_key == NULL) { + /* moving within a block */ + xfs_alloc_key_t *kp = &src_key->u.alloc; + memmove(&kp[to], &kp[from], numkeys * sizeof(*kp)); + } else { + /* moving between blocks */ + memcpy(dst_key, src_key, numkeys * sizeof(xfs_alloc_key_t)); } - xfs_alloc_log_keys(cur, nbp, 1, 2); - /* - * Fill in the pointer data in the new root. - */ - { - xfs_alloc_ptr_t *pp; /* btree address pointer */ +} - pp = XFS_ALLOC_PTR_ADDR(new, 1, cur); - pp[0] = cpu_to_be32(lbno); - pp[1] = cpu_to_be32(rbno); +STATIC void +xfs_alloc_move_ptrs( + xfs_btree_cur_t *cur, + xfs_btree_ptr_t *src_ptr, + xfs_btree_ptr_t *dst_ptr, + int from, + int to, + int numptrs) +{ + BUG_ON(from < 0 || to < 0); + BUG_ON(from > 1000 || to > 1000); + BUG_ON(numptrs < 0); + if (numptrs == 0) + return; + if (dst_ptr == NULL) { + xfs_alloc_ptr_t *pp = &src_ptr->u.alloc; + memmove(&pp[to], &pp[from], numptrs * sizeof(*pp)); + } else { + memcpy(dst_ptr, src_ptr, numptrs * sizeof(xfs_alloc_ptr_t)); } - xfs_alloc_log_ptrs(cur, nbp, 1, 2); - /* - * Fix up the cursor. - */ - xfs_btree_setbuf(cur, cur->bc_nlevels, nbp); - cur->bc_ptrs[cur->bc_nlevels] = nptr; - cur->bc_nlevels++; - *stat = 1; - return 0; } -/* - * Move 1 record right from cur/level if possible. - * Update cur to reflect the new path. - */ -STATIC int /* error */ -xfs_alloc_rshift( - xfs_btree_cur_t *cur, /* btree cursor */ - int level, /* level to shift record on */ - int *stat) /* success/failure */ -{ - int error; /* error return value */ - int i; /* loop index */ - xfs_alloc_key_t key; /* key value for leaf level upward */ - xfs_buf_t *lbp; /* buffer for left (current) block */ - xfs_alloc_block_t *left; /* left (current) btree block */ - xfs_buf_t *rbp; /* buffer for right neighbor block */ - xfs_alloc_block_t *right; /* right neighbor btree block */ - xfs_alloc_key_t *rkp; /* key pointer for right block */ - xfs_btree_cur_t *tcur; /* temporary cursor */ - - /* - * Set up variables for this block as "left". - */ - lbp = cur->bc_bufs[level]; - left = XFS_BUF_TO_ALLOC_BLOCK(lbp); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, left, level, lbp))) - return error; -#endif - /* - * If we've got no right sibling then we can't shift an entry right. - */ - if (be32_to_cpu(left->bb_rightsib) == NULLAGBLOCK) { - *stat = 0; - return 0; - } - /* - * If the cursor entry is the one that would be moved, don't - * do it... it's too complicated. - */ - if (cur->bc_ptrs[level] >= be16_to_cpu(left->bb_numrecs)) { - *stat = 0; - return 0; - } - /* - * Set up the right neighbor as "right". - */ - if ((error = xfs_btree_read_bufs(cur->bc_mp, cur->bc_tp, - cur->bc_private.a.agno, be32_to_cpu(left->bb_rightsib), - 0, &rbp, XFS_ALLOC_BTREE_REF))) - return error; - right = XFS_BUF_TO_ALLOC_BLOCK(rbp); - if ((error = xfs_btree_check_sblock(cur, right, level, rbp))) - return error; - /* - * If it's full, it can't take another entry. - */ - if (be16_to_cpu(right->bb_numrecs) == XFS_ALLOC_BLOCK_MAXRECS(level, cur)) { - *stat = 0; - return 0; - } - /* - * Make a hole at the start of the right neighbor block, then - * copy the last left block entry to the hole. - */ - if (level > 0) { - xfs_alloc_key_t *lkp; /* key pointer for left block */ - xfs_alloc_ptr_t *lpp; /* address pointer for left block */ - xfs_alloc_ptr_t *rpp; /* address pointer for right block */ - - lkp = XFS_ALLOC_KEY_ADDR(left, be16_to_cpu(left->bb_numrecs), cur); - lpp = XFS_ALLOC_PTR_ADDR(left, be16_to_cpu(left->bb_numrecs), cur); - rkp = XFS_ALLOC_KEY_ADDR(right, 1, cur); - rpp = XFS_ALLOC_PTR_ADDR(right, 1, cur); -#ifdef DEBUG - for (i = be16_to_cpu(right->bb_numrecs) - 1; i >= 0; i--) { - if ((error = xfs_btree_check_sptr(cur, be32_to_cpu(rpp[i]), level))) - return error; - } -#endif - memmove(rkp + 1, rkp, be16_to_cpu(right->bb_numrecs) * sizeof(*rkp)); - memmove(rpp + 1, rpp, be16_to_cpu(right->bb_numrecs) * sizeof(*rpp)); -#ifdef DEBUG - if ((error = xfs_btree_check_sptr(cur, be32_to_cpu(*lpp), level))) - return error; -#endif - *rkp = *lkp; - *rpp = *lpp; - xfs_alloc_log_keys(cur, rbp, 1, be16_to_cpu(right->bb_numrecs) + 1); - xfs_alloc_log_ptrs(cur, rbp, 1, be16_to_cpu(right->bb_numrecs) + 1); - xfs_btree_check_key(cur->bc_btnum, rkp, rkp + 1); +STATIC void +xfs_alloc_move_recs( + xfs_btree_cur_t *cur, + xfs_btree_rec_t *src_rec, + xfs_btree_rec_t *dst_rec, + int from, + int to, + int numrecs) +{ + BUG_ON(from < 0 || to < 0); + BUG_ON(from > 1000 || to > 1000); + BUG_ON(numrecs < 0); + if (numrecs == 0) + return; + if (dst_rec == NULL) { + xfs_alloc_rec_t *rp = &src_rec->u.alloc; + memmove(&rp[to], &rp[from], numrecs * sizeof(*rp)); } else { - xfs_alloc_rec_t *lrp; /* record pointer for left block */ - xfs_alloc_rec_t *rrp; /* record pointer for right block */ - - lrp = XFS_ALLOC_REC_ADDR(left, be16_to_cpu(left->bb_numrecs), cur); - rrp = XFS_ALLOC_REC_ADDR(right, 1, cur); - memmove(rrp + 1, rrp, be16_to_cpu(right->bb_numrecs) * sizeof(*rrp)); - *rrp = *lrp; - xfs_alloc_log_recs(cur, rbp, 1, be16_to_cpu(right->bb_numrecs) + 1); - key.ar_startblock = rrp->ar_startblock; - key.ar_blockcount = rrp->ar_blockcount; - rkp = &key; - xfs_btree_check_rec(cur->bc_btnum, rrp, rrp + 1); + memcpy(dst_rec, src_rec, numrecs * sizeof(xfs_alloc_rec_t)); } - /* - * Decrement and log left's numrecs, bump and log right's numrecs. - */ - be16_add(&left->bb_numrecs, -1); - xfs_alloc_log_block(cur->bc_tp, lbp, XFS_BB_NUMRECS); - be16_add(&right->bb_numrecs, 1); - xfs_alloc_log_block(cur->bc_tp, rbp, XFS_BB_NUMRECS); - /* - * Using a temporary cursor, update the parent key values of the - * block on the right. - */ - if ((error = xfs_btree_dup_cursor(cur, &tcur))) - return error; - i = xfs_btree_lastrec(tcur, level); - XFS_WANT_CORRUPTED_GOTO(i == 1, error0); - if ((error = xfs_alloc_increment(tcur, level, &i)) || - (error = xfs_alloc_updkey(tcur, rkp, level + 1))) - goto error0; - xfs_btree_del_cursor(tcur, XFS_BTREE_NOERROR); - *stat = 1; - return 0; -error0: - xfs_btree_del_cursor(tcur, XFS_BTREE_ERROR); - return error; } -/* - * Split cur/level block in half. - * Return new block number and its first record (to be inserted into parent). - */ -STATIC int /* error */ -xfs_alloc_split( - xfs_btree_cur_t *cur, /* btree cursor */ - int level, /* level to split */ - xfs_agblock_t *bnop, /* output: block number allocated */ - xfs_alloc_key_t *keyp, /* output: first key of new block */ - xfs_btree_cur_t **curp, /* output: new cursor */ - int *stat) /* success/failure */ + +STATIC void +xfs_alloc_set_key( + xfs_btree_cur_t *cur, + xfs_btree_key_t *key_addr, + int index, + xfs_btree_key_t *newkey) { - int error; /* error return value */ - int i; /* loop index/record number */ - xfs_agblock_t lbno; /* left (current) block number */ - xfs_buf_t *lbp; /* buffer for left block */ - xfs_alloc_block_t *left; /* left (current) btree block */ - xfs_agblock_t rbno; /* right (new) block number */ - xfs_buf_t *rbp; /* buffer for right block */ - xfs_alloc_block_t *right; /* right (new) btree block */ + xfs_alloc_key_t *kp = &key_addr->u.alloc; - /* - * Allocate the new block from the freelist. - * If we can't do it, we're toast. Give up. - */ - error = xfs_alloc_get_freelist(cur->bc_tp, - cur->bc_private.a.agbp, &rbno, 1); - if (error) - return error; - if (rbno == NULLAGBLOCK) { - *stat = 0; - return 0; - } - xfs_trans_agbtree_delta(cur->bc_tp, 1); - rbp = xfs_btree_get_bufs(cur->bc_mp, cur->bc_tp, cur->bc_private.a.agno, - rbno, 0); - /* - * Set up the new block as "right". - */ - right = XFS_BUF_TO_ALLOC_BLOCK(rbp); - /* - * "Left" is the current (according to the cursor) block. - */ - lbp = cur->bc_bufs[level]; - left = XFS_BUF_TO_ALLOC_BLOCK(lbp); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, left, level, lbp))) - return error; -#endif - /* - * Fill in the btree header for the new block. - */ - right->bb_magic = cpu_to_be32(xfs_magics[cur->bc_btnum]); - right->bb_level = left->bb_level; - right->bb_numrecs = cpu_to_be16(be16_to_cpu(left->bb_numrecs) / 2); - /* - * Make sure that if there's an odd number of entries now, that - * each new block will have the same number of entries. - */ - if ((be16_to_cpu(left->bb_numrecs) & 1) && - cur->bc_ptrs[level] <= be16_to_cpu(right->bb_numrecs) + 1) - be16_add(&right->bb_numrecs, 1); - i = be16_to_cpu(left->bb_numrecs) - be16_to_cpu(right->bb_numrecs) + 1; - /* - * For non-leaf blocks, copy keys and addresses over to the new block. - */ - if (level > 0) { - xfs_alloc_key_t *lkp; /* left btree key pointer */ - xfs_alloc_ptr_t *lpp; /* left btree address pointer */ - xfs_alloc_key_t *rkp; /* right btree key pointer */ - xfs_alloc_ptr_t *rpp; /* right btree address pointer */ - - lkp = XFS_ALLOC_KEY_ADDR(left, i, cur); - lpp = XFS_ALLOC_PTR_ADDR(left, i, cur); - rkp = XFS_ALLOC_KEY_ADDR(right, 1, cur); - rpp = XFS_ALLOC_PTR_ADDR(right, 1, cur); -#ifdef DEBUG - for (i = 0; i < be16_to_cpu(right->bb_numrecs); i++) { - if ((error = xfs_btree_check_sptr(cur, be32_to_cpu(lpp[i]), level))) - return error; - } -#endif - memcpy(rkp, lkp, be16_to_cpu(right->bb_numrecs) * sizeof(*rkp)); - memcpy(rpp, lpp, be16_to_cpu(right->bb_numrecs) * sizeof(*rpp)); - xfs_alloc_log_keys(cur, rbp, 1, be16_to_cpu(right->bb_numrecs)); - xfs_alloc_log_ptrs(cur, rbp, 1, be16_to_cpu(right->bb_numrecs)); - *keyp = *rkp; - } - /* - * For leaf blocks, copy records over to the new block. - */ - else { - xfs_alloc_rec_t *lrp; /* left btree record pointer */ - xfs_alloc_rec_t *rrp; /* right btree record pointer */ - - lrp = XFS_ALLOC_REC_ADDR(left, i, cur); - rrp = XFS_ALLOC_REC_ADDR(right, 1, cur); - memcpy(rrp, lrp, be16_to_cpu(right->bb_numrecs) * sizeof(*rrp)); - xfs_alloc_log_recs(cur, rbp, 1, be16_to_cpu(right->bb_numrecs)); - keyp->ar_startblock = rrp->ar_startblock; - keyp->ar_blockcount = rrp->ar_blockcount; - } - /* - * Find the left block number by looking in the buffer. - * Adjust numrecs, sibling pointers. - */ - lbno = XFS_DADDR_TO_AGBNO(cur->bc_mp, XFS_BUF_ADDR(lbp)); - be16_add(&left->bb_numrecs, -(be16_to_cpu(right->bb_numrecs))); - right->bb_rightsib = left->bb_rightsib; - left->bb_rightsib = cpu_to_be32(rbno); - right->bb_leftsib = cpu_to_be32(lbno); - xfs_alloc_log_block(cur->bc_tp, rbp, XFS_BB_ALL_BITS); - xfs_alloc_log_block(cur->bc_tp, lbp, XFS_BB_NUMRECS | XFS_BB_RIGHTSIB); - /* - * If there's a block to the new block's right, make that block - * point back to right instead of to left. - */ - if (be32_to_cpu(right->bb_rightsib) != NULLAGBLOCK) { - xfs_alloc_block_t *rrblock; /* rr btree block */ - xfs_buf_t *rrbp; /* buffer for rrblock */ - - if ((error = xfs_btree_read_bufs(cur->bc_mp, cur->bc_tp, - cur->bc_private.a.agno, be32_to_cpu(right->bb_rightsib), 0, - &rrbp, XFS_ALLOC_BTREE_REF))) - return error; - rrblock = XFS_BUF_TO_ALLOC_BLOCK(rrbp); - if ((error = xfs_btree_check_sblock(cur, rrblock, level, rrbp))) - return error; - rrblock->bb_leftsib = cpu_to_be32(rbno); - xfs_alloc_log_block(cur->bc_tp, rrbp, XFS_BB_LEFTSIB); - } - /* - * If the cursor is really in the right block, move it there. - * If it's just pointing past the last entry in left, then we'll - * insert there, so don't change anything in that case. - */ - if (cur->bc_ptrs[level] > be16_to_cpu(left->bb_numrecs) + 1) { - xfs_btree_setbuf(cur, level, rbp); - cur->bc_ptrs[level] -= be16_to_cpu(left->bb_numrecs); - } - /* - * If there are more levels, we'll need another cursor which refers to - * the right block, no matter where this cursor was. - */ - if (level + 1 < cur->bc_nlevels) { - if ((error = xfs_btree_dup_cursor(cur, curp))) - return error; - (*curp)->bc_ptrs[level + 1]++; - } - *bnop = rbno; - *stat = 1; - return 0; + kp[index] = newkey->u.alloc; +} + +STATIC void +xfs_alloc_set_ptr( + xfs_btree_cur_t *cur, + xfs_btree_ptr_t *ptr_addr, + int index, + xfs_btree_ptr_t *newptr) +{ + xfs_alloc_ptr_t *pp = &ptr_addr->u.alloc; + + pp[index] = newptr->u.alloc; +} + +STATIC void +xfs_alloc_set_rec( + xfs_btree_cur_t *cur, + xfs_btree_rec_t *rec_addr, + int index, + xfs_btree_rec_t *newrec) +{ + xfs_alloc_rec_t *rp = &rec_addr->u.alloc; + + rp[index] = newrec->u.alloc; } /* - * Update keys at all levels from here to the root along the cursor's path. + * Log keys from a btree block (nonleaf). */ -STATIC int /* error */ -xfs_alloc_updkey( +STATIC void +xfs_alloc_log_keys( xfs_btree_cur_t *cur, /* btree cursor */ - xfs_alloc_key_t *keyp, /* new key value to update to */ - int level) /* starting level for update */ + xfs_buf_t *bp, /* buffer containing btree block */ + int kfirst, /* index of first key to log */ + int klast) /* index of last key to log */ { - int ptr; /* index of key in block */ - - /* - * Go up the tree from this level toward the root. - * At each level, update the key value to the value input. - * Stop when we reach a level where the cursor isn't pointing - * at the first entry in the block. - */ - for (ptr = 1; ptr == 1 && level < cur->bc_nlevels; level++) { - xfs_alloc_block_t *block; /* btree block */ - xfs_buf_t *bp; /* buffer for block */ -#ifdef DEBUG - int error; /* error return value */ -#endif - xfs_alloc_key_t *kp; /* ptr to btree block keys */ + xfs_alloc_block_t *block; /* btree block to log from */ + int first; /* first byte offset logged */ + xfs_alloc_key_t *kp; /* key pointer in btree block */ + int last; /* last byte offset logged */ - bp = cur->bc_bufs[level]; - block = XFS_BUF_TO_ALLOC_BLOCK(bp); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, block, level, bp))) - return error; -#endif - ptr = cur->bc_ptrs[level]; - kp = XFS_ALLOC_KEY_ADDR(block, ptr, cur); - *kp = *keyp; - xfs_alloc_log_keys(cur, bp, ptr, ptr); - } - return 0; + XFS_BTREE_TRACE_CURSOR(cur, ENTRY); + XFS_BTREE_TRACE_ARGBII(cur, bp, kfirst, klast); + block = XFS_BUF_TO_ALLOC_BLOCK(bp); + kp = XFS_ALLOC_KEY_ADDR(block, 1, cur); + first = (int)((xfs_caddr_t)&kp[kfirst - 1] - (xfs_caddr_t)block); + last = (int)(((xfs_caddr_t)&kp[klast] - 1) - (xfs_caddr_t)block); + xfs_trans_log_buf(cur->bc_tp, bp, first, last); + XFS_BTREE_TRACE_CURSOR(cur, EXIT); } /* - * Externally visible routines. + * Log block pointer fields from a btree block (nonleaf). */ +STATIC void +xfs_alloc_log_ptrs( + xfs_btree_cur_t *cur, /* btree cursor */ + xfs_buf_t *bp, /* buffer containing btree block */ + int pfirst, /* index of first pointer to log */ + int plast) /* index of last pointer to log */ +{ + xfs_alloc_block_t *block; /* btree block to log from */ + int first; /* first byte offset logged */ + int last; /* last byte offset logged */ + xfs_alloc_ptr_t *pp; /* block-pointer pointer in btree blk */ + + XFS_BTREE_TRACE_CURSOR(cur, ENTRY); + XFS_BTREE_TRACE_ARGBII(cur, bp, pfirst, plast); + block = XFS_BUF_TO_ALLOC_BLOCK(bp); + pp = XFS_ALLOC_PTR_ADDR(block, 1, cur); + first = (int)((xfs_caddr_t)&pp[pfirst - 1] - (xfs_caddr_t)block); + last = (int)(((xfs_caddr_t)&pp[plast] - 1) - (xfs_caddr_t)block); + xfs_trans_log_buf(cur->bc_tp, bp, first, last); + XFS_BTREE_TRACE_CURSOR(cur, EXIT); +} /* - * Decrement cursor by one record at the level. - * For nonzero levels the leaf-ward information is untouched. + * Log records from a btree block (leaf). */ -int /* error */ -xfs_alloc_decrement( +STATIC void +xfs_alloc_log_recs( xfs_btree_cur_t *cur, /* btree cursor */ - int level, /* level in btree, 0 is leaf */ - int *stat) /* success/failure */ + xfs_buf_t *bp, /* buffer containing btree block */ + int rfirst, /* index of first record to log */ + int rlast) /* index of last record to log */ { - xfs_alloc_block_t *block; /* btree block */ - int error; /* error return value */ - int lev; /* btree level */ + xfs_alloc_block_t *block; /* btree block to log from */ + int first; /* first byte offset logged */ + int last; /* last byte offset logged */ + xfs_alloc_rec_t *rp; /* record pointer for btree block */ - ASSERT(level < cur->bc_nlevels); - /* - * Read-ahead to the left at this level. - */ - xfs_btree_readahead(cur, level, XFS_BTCUR_LEFTRA); - /* - * Decrement the ptr at this level. If we're still in the block - * then we're done. - */ - if (--cur->bc_ptrs[level] > 0) { - *stat = 1; - return 0; - } - /* - * Get a pointer to the btree block. - */ - block = XFS_BUF_TO_ALLOC_BLOCK(cur->bc_bufs[level]); + + XFS_BTREE_TRACE_CURSOR(cur, ENTRY); + XFS_BTREE_TRACE_ARGBII(cur, bp, rfirst, rlast); + block = XFS_BUF_TO_ALLOC_BLOCK(bp); + rp = XFS_ALLOC_REC_ADDR(block, 1, cur); #ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, block, level, - cur->bc_bufs[level]))) - return error; -#endif - /* - * If we just went off the left edge of the tree, return failure. - */ - if (be32_to_cpu(block->bb_leftsib) == NULLAGBLOCK) { - *stat = 0; - return 0; + { + xfs_agf_t *agf; + xfs_alloc_rec_t *p; + + agf = XFS_BUF_TO_AGF(cur->bc_private.a.agbp); + for (p = &rp[rfirst - 1]; p <= &rp[rlast - 1]; p++) + ASSERT(be32_to_cpu(p->ar_startblock) + + be32_to_cpu(p->ar_blockcount) <= + be32_to_cpu(agf->agf_length)); } +#endif + first = (int)((xfs_caddr_t)&rp[rfirst - 1] - (xfs_caddr_t)block); + last = (int)(((xfs_caddr_t)&rp[rlast] - 1) - (xfs_caddr_t)block); + xfs_trans_log_buf(cur->bc_tp, bp, first, last); + XFS_BTREE_TRACE_CURSOR(cur, EXIT); +} + +static const struct xfs_btree_record_ops xfs_alloc_recops = { + .get_minrecs = xfs_alloc_get_minrecs, + .get_maxrecs = xfs_alloc_get_maxrecs, + .get_numrecs = xfs_btree_get_numrecs, + .set_numrecs = xfs_btree_set_numrecs, + + .init_key_from_rec = xfs_alloc_init_key_from_rec, + .init_ptr_from_cur = xfs_alloc_init_ptr_from_cur, + .init_rec_from_key = xfs_alloc_init_rec_from_key, + .init_rec_from_cur = xfs_alloc_init_rec_from_cur, + + .key_addr = xfs_alloc_key_addr, + .ptr_addr = xfs_alloc_ptr_addr, + .rec_addr = xfs_alloc_rec_addr, + + .key_diff = xfs_alloc_key_diff, + .ptr_to_daddr = xfs_alloc_ptr_to_daddr, + + .move_keys = xfs_alloc_move_keys, + .move_ptrs = xfs_alloc_move_ptrs, + .move_recs = xfs_alloc_move_recs, + + .set_key = xfs_alloc_set_key, + .set_ptr = xfs_alloc_set_ptr, + .set_rec = xfs_alloc_set_rec, + + .log_keys = xfs_alloc_log_keys, + .log_ptrs = xfs_alloc_log_ptrs, + .log_recs = xfs_alloc_log_recs, + + .check_ptrs = xfs_btree_check_sptr, +}; + +STATIC void +xfs_alloc_setroot( + xfs_btree_cur_t *cur, + xfs_btree_ptr_t *ptr, + int inc) +{ + xfs_agf_t *agf; /* a.g. freespace header */ + xfs_agnumber_t seqno; + + agf = XFS_BUF_TO_AGF(cur->bc_private.a.agbp); + + BUG_ON(ptr->u.alloc == 0); + agf->agf_roots[cur->bc_btnum] = ptr->u.alloc; + be32_add(&agf->agf_levels[cur->bc_btnum], inc); + + seqno = be32_to_cpu(agf->agf_seqno); + cur->bc_mp->m_perag[seqno].pagf_levels[cur->bc_btnum] += inc; + + xfs_alloc_log_agf(cur->bc_tp, cur->bc_private.a.agbp, + XFS_AGF_ROOTS | XFS_AGF_LEVELS); +} + +STATIC int +xfs_alloc_killroot( + xfs_btree_cur_t *cur, + int level, + xfs_btree_ptr_t *newroot) +{ + xfs_agf_t *agf; /* allocation group freelist header */ + xfs_agblock_t bno; /* old root block number */ + int error; + + agf = XFS_BUF_TO_AGF(cur->bc_private.a.agbp); + /* - * March up the tree decrementing pointers. - * Stop when we don't go off the left edge of a block. + * Set the root entry in the agf structure, + * decreasing the level by 1. */ - for (lev = level + 1; lev < cur->bc_nlevels; lev++) { - if (--cur->bc_ptrs[lev] > 0) - break; - /* - * Read-ahead the left block, we're going to read it - * in the next loop. - */ - xfs_btree_readahead(cur, lev, XFS_BTCUR_LEFTRA); - } + bno = be32_to_cpu(agf->agf_roots[cur->bc_btnum]); + xfs_alloc_setroot(cur, newroot, -1); /* - * If we went off the root then we are seriously confused. + * Put this buffer/block on the ag's freelist. */ - ASSERT(lev < cur->bc_nlevels); + BUG_ON(bno == 0); + error = xfs_alloc_put_freelist(cur->bc_tp, + cur->bc_private.a.agbp, NULL, bno, 1); + if (error) + return error; /* - * Now walk back down the tree, fixing up the cursor's buffer - * pointers and key numbers. + * Since blocks move to the free list without the + * coordination used in xfs_bmap_finish, we can't allow + * block to be available for reallocation and + * non-transaction writing (user data) until we know + * that the transaction that moved it to the free list + * is permanently on disk. We track the blocks by + * declaring these blocks as "busy"; the busy list is + * maintained on a per-ag basis and each transaction + * records which entries should be removed when the + * iclog commits to disk. If a busy block is + * allocated, the iclog is pushed up to the LSN + * that freed the block. */ - for (block = XFS_BUF_TO_ALLOC_BLOCK(cur->bc_bufs[lev]); lev > level; ) { - xfs_agblock_t agbno; /* block number of btree block */ - xfs_buf_t *bp; /* buffer pointer for block */ - - agbno = be32_to_cpu(*XFS_ALLOC_PTR_ADDR(block, cur->bc_ptrs[lev], cur)); - if ((error = xfs_btree_read_bufs(cur->bc_mp, cur->bc_tp, - cur->bc_private.a.agno, agbno, 0, &bp, - XFS_ALLOC_BTREE_REF))) - return error; - lev--; - xfs_btree_setbuf(cur, lev, bp); - block = XFS_BUF_TO_ALLOC_BLOCK(bp); - if ((error = xfs_btree_check_sblock(cur, block, lev, bp))) - return error; - cur->bc_ptrs[lev] = be16_to_cpu(block->bb_numrecs); - } - *stat = 1; - return 0; -} - -/* - * Delete the record pointed to by cur. - * The cursor refers to the place where the record was (could be inserted) - * when the operation returns. - */ -int /* error */ -xfs_alloc_delete( - xfs_btree_cur_t *cur, /* btree cursor */ - int *stat) /* success/failure */ -{ - int error; /* error return value */ - int i; /* result code */ - int level; /* btree level */ + xfs_alloc_mark_busy(cur->bc_tp, + be32_to_cpu(agf->agf_seqno), bno, 1); + xfs_trans_agbtree_delta(cur->bc_tp, -1); /* - * Go up the tree, starting at leaf level. - * If 2 is returned then a join was done; go to the next level. - * Otherwise we are done. + * Update the cursor so there's one fewer level. */ - for (level = 0, i = 2; i == 2; level++) { - if ((error = xfs_alloc_delrec(cur, level, &i))) - return error; - } - if (i == 0) { - for (level = 1; level < cur->bc_nlevels; level++) { - if (cur->bc_ptrs[level] == 0) { - if ((error = xfs_alloc_decrement(cur, level, &i))) - return error; - break; - } - } - } - *stat = i; + xfs_btree_setbuf(cur, level, NULL); + cur->bc_nlevels--; return 0; } /* - * Get the data from the pointed-to record. + * update the longest extent in the AGF */ -int /* error */ -xfs_alloc_get_rec( - xfs_btree_cur_t *cur, /* btree cursor */ - xfs_agblock_t *bno, /* output: starting block of extent */ - xfs_extlen_t *len, /* output: length of extent */ - int *stat) /* output: success/failure */ +STATIC int +xfs_alloc_update_lastrec( + xfs_btree_cur_t *cur, + xfs_btree_block_t *block) { - xfs_alloc_block_t *block; /* btree block */ -#ifdef DEBUG - int error; /* error return value */ -#endif - int ptr; /* record number */ + xfs_agf_t *agf; /* allocation group freelist header */ + xfs_alloc_rec_t *rrp; /* right block record pointer */ + int numrecs; - ptr = cur->bc_ptrs[0]; - block = XFS_BUF_TO_ALLOC_BLOCK(cur->bc_bufs[0]); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, block, 0, cur->bc_bufs[0]))) - return error; -#endif + if (cur->bc_btnum != XFS_BTNUM_CNT) + return 0; + + agf = XFS_BUF_TO_AGF(cur->bc_private.a.agbp); /* - * Off the right end or left end, return failure. + * There are still records in the block. Grab the size + * from the last one. */ - if (ptr > be16_to_cpu(block->bb_numrecs) || ptr <= 0) { - *stat = 0; - return 0; + numrecs = xfs_btree_get_numrecs(cur, block); + if (numrecs) { + rrp = XFS_ALLOC_REC_ADDR(block, numrecs, cur); + ASSERT(be32_to_cpu(rrp->ar_blockcount) >= + be32_to_cpu(agf->agf_longest)); + agf->agf_longest = rrp->ar_blockcount; } /* - * Point to the record and extract its data. + * No free extents left. */ - { - xfs_alloc_rec_t *rec; /* record data */ + else + agf->agf_longest = 0; - rec = XFS_ALLOC_REC_ADDR(block, ptr, cur); - *bno = be32_to_cpu(rec->ar_startblock); - *len = be32_to_cpu(rec->ar_blockcount); - } - *stat = 1; + cur->bc_mp->m_perag[be32_to_cpu(agf->agf_seqno)].pagf_longest = + be32_to_cpu(agf->agf_longest); + xfs_alloc_log_agf(cur->bc_tp, cur->bc_private.a.agbp, XFS_AGF_LONGEST); return 0; } +static const struct xfs_btree_cur_ops xfs_alloc_curops = { + .update_lastrec = xfs_alloc_update_lastrec, + .set_root = xfs_alloc_setroot, + .new_root = xfs_btree_newroot, + .kill_root = xfs_alloc_killroot, +}; + +#if defined(XFS_BTREE_TRACE) + /* - * Increment cursor by one record at the level. - * For nonzero levels the leaf-ward information is untouched. + * Global alloc btree trace buffer */ -int /* error */ -xfs_alloc_increment( - xfs_btree_cur_t *cur, /* btree cursor */ - int level, /* level in btree, 0 is leaf */ - int *stat) /* success/failure */ +ktrace_t *xfs_allocbt_trace_buf; +/* + * Add a trace buffer entry for the arguments given to the routine, + * generic form. + */ +STATIC void +xfs_alloc_trace_enter( + const char *func, + xfs_btree_cur_t *cur, + char *s, + int type, + int line, + __psunsigned_t a0, + __psunsigned_t a1, + __psunsigned_t a2, + __psunsigned_t a3, + __psunsigned_t a4, + __psunsigned_t a5, + __psunsigned_t a6, + __psunsigned_t a7, + __psunsigned_t a8, + __psunsigned_t a9, + __psunsigned_t a10) +{ + ktrace_enter(xfs_allocbt_trace_buf, + (void *)(__psint_t)type, + (void *)func, (void *)s, (void *)ip, (void *)cur, + (void *)a0, (void *)a1, (void *)a2, (void *)a3, + (void *)a4, (void *)a5, (void *)a6, (void *)a7, + (void *)a8, (void *)a9, (void *)a10); +} + +STATIC void +xfs_alloc_trace_cursor( + xfs_btree_cur_t *cur, + __uint32_t *s0, + __uint64_t *l0, + __uint64_t *l1) +{ + *s0 = cur->bc_private.a.agno; + *l0 = cur->bc_rec.a.ar_startblock; + *l1 = cur->bc_rec.a.ar_blockcount; +} + +STATIC void +xfs_alloc_trace_record( + xfs_btree_cur_t *cur, + xfs_btree_rec_t *rec, + __uint64_t *l0, + __uint64_t *l1, + __uint64_t *l2) { - xfs_alloc_block_t *block; /* btree block */ - xfs_buf_t *bp; /* tree block buffer */ - int error; /* error return value */ - int lev; /* btree level */ + *l0 = be32_to_cpu(&rec->u.alloc.ar_startblock); + *l1 = be32_to_cpu(&rec->u.alloc.ar_blockcount); + *l2 = 0; +} - ASSERT(level < cur->bc_nlevels); - /* - * Read-ahead to the right at this level. - */ - xfs_btree_readahead(cur, level, XFS_BTCUR_RIGHTRA); - /* - * Get a pointer to the btree block. - */ - bp = cur->bc_bufs[level]; - block = XFS_BUF_TO_ALLOC_BLOCK(bp); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, block, level, bp))) - return error; +static const struct xfs_btree_trc_ops xfs_alloc_trcops = { + .enter = xfs_alloc_trace_enter, + .cursor = xfs_alloc_trace_cursor, + .record = xfs_alloc_trace_record, +}; #endif - /* - * Increment the ptr at this level. If we're still in the block - * then we're done. - */ - if (++cur->bc_ptrs[level] <= be16_to_cpu(block->bb_numrecs)) { - *stat = 1; - return 0; - } - /* - * If we just went off the right edge of the tree, return failure. - */ - if (be32_to_cpu(block->bb_rightsib) == NULLAGBLOCK) { - *stat = 0; - return 0; - } - /* - * March up the tree incrementing pointers. - * Stop when we don't go off the right edge of a block. - */ - for (lev = level + 1; lev < cur->bc_nlevels; lev++) { - bp = cur->bc_bufs[lev]; - block = XFS_BUF_TO_ALLOC_BLOCK(bp); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, block, lev, bp))) - return error; + +void +xfs_alloc_init_cursor( + xfs_btree_cur_t *cur) +{ + cur->bc_flags = 0; + if (cur->bc_btnum == XFS_BTNUM_CNT) + cur->bc_flags |= XFS_BTREE_LASTREC_UPDATE; + cur->bc_curops = &xfs_alloc_curops; + cur->bc_blkops = &xfs_alloc_blkops; + cur->bc_recops = &xfs_alloc_recops; +#if defined(XFS_BTREE_TRACE) + cur->bc_trcops = &xfs_alloc_trcops; #endif - if (++cur->bc_ptrs[lev] <= be16_to_cpu(block->bb_numrecs)) - break; - /* - * Read-ahead the right block, we're going to read it - * in the next loop. - */ - xfs_btree_readahead(cur, lev, XFS_BTCUR_RIGHTRA); - } - /* - * If we went off the root then we are seriously confused. - */ - ASSERT(lev < cur->bc_nlevels); - /* - * Now walk back down the tree, fixing up the cursor's buffer - * pointers and key numbers. - */ - for (bp = cur->bc_bufs[lev], block = XFS_BUF_TO_ALLOC_BLOCK(bp); - lev > level; ) { - xfs_agblock_t agbno; /* block number of btree block */ - - agbno = be32_to_cpu(*XFS_ALLOC_PTR_ADDR(block, cur->bc_ptrs[lev], cur)); - if ((error = xfs_btree_read_bufs(cur->bc_mp, cur->bc_tp, - cur->bc_private.a.agno, agbno, 0, &bp, - XFS_ALLOC_BTREE_REF))) - return error; - lev--; - xfs_btree_setbuf(cur, lev, bp); - block = XFS_BUF_TO_ALLOC_BLOCK(bp); - if ((error = xfs_btree_check_sblock(cur, block, lev, bp))) - return error; - cur->bc_ptrs[lev] = 1; - } - *stat = 1; - return 0; } /* - * Insert the current record at the point referenced by cur. - * The cursor may be inconsistent on return if splits have been done. + * ALLOC functions that are not covered by core btree code. + * Externally visible routines. + */ + +/* + * Update the record referred to by cur, to the value given by [bno, len]. + * This either works (return 0) or gets an EFSCORRUPTED error. */ int /* error */ -xfs_alloc_insert( - xfs_btree_cur_t *cur, /* btree cursor */ - int *stat) /* success/failure */ +xfs_alloc_update( + xfs_btree_cur_t *cur, /* btree cursor */ + xfs_agblock_t bno, /* starting block of extent */ + xfs_extlen_t len) /* length of extent */ { - int error; /* error return value */ - int i; /* result value, 0 for failure */ - int level; /* current level number in btree */ - xfs_agblock_t nbno; /* new block number (split result) */ - xfs_btree_cur_t *ncur; /* new cursor (split result) */ - xfs_alloc_rec_t nrec; /* record being inserted this level */ - xfs_btree_cur_t *pcur; /* previous level's cursor */ - - level = 0; - nbno = NULLAGBLOCK; - nrec.ar_startblock = cpu_to_be32(cur->bc_rec.a.ar_startblock); - nrec.ar_blockcount = cpu_to_be32(cur->bc_rec.a.ar_blockcount); - ncur = NULL; - pcur = cur; - /* - * Loop going up the tree, starting at the leaf level. - * Stop when we don't get a split block, that must mean that - * the insert is finished with this level. - */ - do { - /* - * Insert nrec/nbno into this level of the tree. - * Note if we fail, nbno will be null. - */ - if ((error = xfs_alloc_insrec(pcur, level++, &nbno, &nrec, &ncur, - &i))) { - if (pcur != cur) - xfs_btree_del_cursor(pcur, XFS_BTREE_ERROR); - return error; - } - /* - * See if the cursor we just used is trash. - * Can't trash the caller's cursor, but otherwise we should - * if ncur is a new cursor or we're about to be done. - */ - if (pcur != cur && (ncur || nbno == NULLAGBLOCK)) { - cur->bc_nlevels = pcur->bc_nlevels; - xfs_btree_del_cursor(pcur, XFS_BTREE_NOERROR); - } - /* - * If we got a new cursor, switch to it. - */ - if (ncur) { - pcur = ncur; - ncur = NULL; - } - } while (nbno != NULLAGBLOCK); - *stat = i; - return 0; + xfs_btree_rec_t rec; + + rec.u.alloc.ar_startblock = cpu_to_be32(bno); + rec.u.alloc.ar_blockcount = cpu_to_be32(len); + return xfs_btree_update(cur, &rec); } /* @@ -2105,7 +828,7 @@ xfs_alloc_lookup_eq( { cur->bc_rec.a.ar_startblock = bno; cur->bc_rec.a.ar_blockcount = len; - return xfs_alloc_lookup(cur, XFS_LOOKUP_EQ, stat); + return xfs_btree_lookup(cur, XFS_LOOKUP_EQ, stat); } /* @@ -2121,7 +844,7 @@ xfs_alloc_lookup_ge( { cur->bc_rec.a.ar_startblock = bno; cur->bc_rec.a.ar_blockcount = len; - return xfs_alloc_lookup(cur, XFS_LOOKUP_GE, stat); + return xfs_btree_lookup(cur, XFS_LOOKUP_GE, stat); } /* @@ -2137,75 +860,53 @@ xfs_alloc_lookup_le( { cur->bc_rec.a.ar_startblock = bno; cur->bc_rec.a.ar_blockcount = len; - return xfs_alloc_lookup(cur, XFS_LOOKUP_LE, stat); + return xfs_btree_lookup(cur, XFS_LOOKUP_LE, stat); } /* - * Update the record referred to by cur, to the value given by [bno, len]. - * This either works (return 0) or gets an EFSCORRUPTED error. + * Get the data from the pointed-to record. */ int /* error */ -xfs_alloc_update( +xfs_alloc_get_rec( xfs_btree_cur_t *cur, /* btree cursor */ - xfs_agblock_t bno, /* starting block of extent */ - xfs_extlen_t len) /* length of extent */ + xfs_agblock_t *bno, /* output: starting block of extent */ + xfs_extlen_t *len, /* output: length of extent */ + int *stat) /* output: success/failure */ { - xfs_alloc_block_t *block; /* btree block to update */ + xfs_btree_block_t *block; /* btree block */ + xfs_btree_rec_t *rec; /* record data */ + xfs_buf_t *bp; /* buffer containing btree block */ +#ifdef DEBUG int error; /* error return value */ - int ptr; /* current record number (updating) */ +#endif + int ptr; /* record number */ - ASSERT(len > 0); - /* - * Pick up the a.g. freelist struct and the current block. - */ - block = XFS_BUF_TO_ALLOC_BLOCK(cur->bc_bufs[0]); + XFS_BTREE_TRACE_CURSOR(cur, ENTRY); + XFS_BTREE_TRACE_ARGFFF(cur, *ino, *fcnt, *free); + + ptr = cur->bc_ptrs[0]; + block = xfs_alloc_get_block(cur, 0, &bp); #ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, block, 0, cur->bc_bufs[0]))) + error = xfs_btree_check_sblock(cur, block, 0, bp); + if (error) return error; #endif /* - * Get the address of the rec to be updated. - */ - ptr = cur->bc_ptrs[0]; - { - xfs_alloc_rec_t *rp; /* pointer to updated record */ - - rp = XFS_ALLOC_REC_ADDR(block, ptr, cur); - /* - * Fill in the new contents and log them. - */ - rp->ar_startblock = cpu_to_be32(bno); - rp->ar_blockcount = cpu_to_be32(len); - xfs_alloc_log_recs(cur, cur->bc_bufs[0], ptr, ptr); - } - /* - * If it's the by-size btree and it's the last leaf block and - * it's the last record... then update the size of the longest - * extent in the a.g., which we cache in the a.g. freelist header. + * Off the right end or left end, return failure. */ - if (cur->bc_btnum == XFS_BTNUM_CNT && - be32_to_cpu(block->bb_rightsib) == NULLAGBLOCK && - ptr == be16_to_cpu(block->bb_numrecs)) { - xfs_agf_t *agf; /* a.g. freespace header */ - xfs_agnumber_t seqno; - - agf = XFS_BUF_TO_AGF(cur->bc_private.a.agbp); - seqno = be32_to_cpu(agf->agf_seqno); - cur->bc_mp->m_perag[seqno].pagf_longest = len; - agf->agf_longest = cpu_to_be32(len); - xfs_alloc_log_agf(cur->bc_tp, cur->bc_private.a.agbp, - XFS_AGF_LONGEST); + if (ptr > be16_to_cpu(block->bb_h.bb_numrecs) || ptr <= 0) { + XFS_BTREE_TRACE_CURSOR(cur, EXIT); + *stat = 0; + return 0; } /* - * Updating first record in leaf. Pass new key value up to our parent. + * Point to the record and extract its data. */ - if (ptr == 1) { - xfs_alloc_key_t key; /* key containing [bno, len] */ - - key.ar_startblock = cpu_to_be32(bno); - key.ar_blockcount = cpu_to_be32(len); - if ((error = xfs_alloc_updkey(cur, &key, 1))) - return error; - } + rec = xfs_alloc_rec_addr(cur, ptr, block); + *bno = be32_to_cpu(rec->u.alloc.ar_startblock); + *len = be32_to_cpu(rec->u.alloc.ar_blockcount); + XFS_BTREE_TRACE_CURSOR(cur, EXIT); + *stat = 1; return 0; } + Index: 2.6.x-xfs-new/fs/xfs/xfs_alloc_btree.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_alloc_btree.h 2007-02-07 13:24:32.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_alloc_btree.h 2007-11-06 19:40:29.702675076 +1100 @@ -94,6 +94,8 @@ typedef struct xfs_btree_sblock xfs_allo #define XFS_ALLOC_PTR_ADDR(bb,i,cur) \ XFS_BTREE_PTR_ADDR(xfs_alloc, bb, i, XFS_ALLOC_BLOCK_MAXRECS(1, cur)) +extern void xfs_alloc_init_cursor(struct xfs_btree_cur *cur); + /* * Decrement cursor by one record at the level. * For nonzero levels the leaf-ward information is untouched. Index: 2.6.x-xfs-new/fs/xfs/xfs_bmap.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_bmap.c 2007-11-05 10:08:51.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_bmap.c 2007-11-06 19:40:29.710674046 +1100 @@ -817,10 +817,10 @@ xfs_bmap_add_extent_delay_real( RIGHT.br_blockcount, &i))) goto done; ASSERT(i == 1); - if ((error = xfs_bmbt_delete(cur, &i))) + if ((error = xfs_btree_delete(cur, &i))) goto done; ASSERT(i == 1); - if ((error = xfs_bmbt_decrement(cur, 0, &i))) + if ((error = xfs_btree_decrement(cur, 0, &i))) goto done; ASSERT(i == 1); if ((error = xfs_bmbt_update(cur, LEFT.br_startoff, @@ -930,7 +930,7 @@ xfs_bmap_add_extent_delay_real( goto done; ASSERT(i == 0); cur->bc_rec.b.br_state = XFS_EXT_NORM; - if ((error = xfs_bmbt_insert(cur, &i))) + if ((error = xfs_btree_insert(cur, &i))) goto done; ASSERT(i == 1); } @@ -1006,7 +1006,7 @@ xfs_bmap_add_extent_delay_real( goto done; ASSERT(i == 0); cur->bc_rec.b.br_state = XFS_EXT_NORM; - if ((error = xfs_bmbt_insert(cur, &i))) + if ((error = xfs_btree_insert(cur, &i))) goto done; ASSERT(i == 1); } @@ -1096,7 +1096,7 @@ xfs_bmap_add_extent_delay_real( goto done; ASSERT(i == 0); cur->bc_rec.b.br_state = XFS_EXT_NORM; - if ((error = xfs_bmbt_insert(cur, &i))) + if ((error = xfs_btree_insert(cur, &i))) goto done; ASSERT(i == 1); } @@ -1151,7 +1151,7 @@ xfs_bmap_add_extent_delay_real( goto done; ASSERT(i == 0); cur->bc_rec.b.br_state = XFS_EXT_NORM; - if ((error = xfs_bmbt_insert(cur, &i))) + if ((error = xfs_btree_insert(cur, &i))) goto done; ASSERT(i == 1); } @@ -1378,16 +1378,16 @@ xfs_bmap_add_extent_unwritten_real( RIGHT.br_blockcount, &i))) goto done; ASSERT(i == 1); - if ((error = xfs_bmbt_delete(cur, &i))) + if ((error = xfs_btree_delete(cur, &i))) goto done; ASSERT(i == 1); - if ((error = xfs_bmbt_decrement(cur, 0, &i))) + if ((error = xfs_btree_decrement(cur, 0, &i))) goto done; ASSERT(i == 1); - if ((error = xfs_bmbt_delete(cur, &i))) + if ((error = xfs_btree_delete(cur, &i))) goto done; ASSERT(i == 1); - if ((error = xfs_bmbt_decrement(cur, 0, &i))) + if ((error = xfs_btree_decrement(cur, 0, &i))) goto done; ASSERT(i == 1); if ((error = xfs_bmbt_update(cur, LEFT.br_startoff, @@ -1427,10 +1427,10 @@ xfs_bmap_add_extent_unwritten_real( &i))) goto done; ASSERT(i == 1); - if ((error = xfs_bmbt_delete(cur, &i))) + if ((error = xfs_btree_delete(cur, &i))) goto done; ASSERT(i == 1); - if ((error = xfs_bmbt_decrement(cur, 0, &i))) + if ((error = xfs_btree_decrement(cur, 0, &i))) goto done; ASSERT(i == 1); if ((error = xfs_bmbt_update(cur, LEFT.br_startoff, @@ -1470,10 +1470,10 @@ xfs_bmap_add_extent_unwritten_real( RIGHT.br_blockcount, &i))) goto done; ASSERT(i == 1); - if ((error = xfs_bmbt_delete(cur, &i))) + if ((error = xfs_btree_delete(cur, &i))) goto done; ASSERT(i == 1); - if ((error = xfs_bmbt_decrement(cur, 0, &i))) + if ((error = xfs_btree_decrement(cur, 0, &i))) goto done; ASSERT(i == 1); if ((error = xfs_bmbt_update(cur, new->br_startoff, @@ -1556,7 +1556,7 @@ xfs_bmap_add_extent_unwritten_real( PREV.br_blockcount - new->br_blockcount, oldext))) goto done; - if ((error = xfs_bmbt_decrement(cur, 0, &i))) + if ((error = xfs_btree_decrement(cur, 0, &i))) goto done; if (xfs_bmbt_update(cur, LEFT.br_startoff, LEFT.br_startblock, @@ -1604,7 +1604,7 @@ xfs_bmap_add_extent_unwritten_real( oldext))) goto done; cur->bc_rec.b = *new; - if ((error = xfs_bmbt_insert(cur, &i))) + if ((error = xfs_btree_insert(cur, &i))) goto done; ASSERT(i == 1); } @@ -1646,7 +1646,7 @@ xfs_bmap_add_extent_unwritten_real( PREV.br_blockcount - new->br_blockcount, oldext))) goto done; - if ((error = xfs_bmbt_increment(cur, 0, &i))) + if ((error = xfs_btree_increment(cur, 0, &i))) goto done; if ((error = xfs_bmbt_update(cur, new->br_startoff, new->br_startblock, @@ -1694,7 +1694,7 @@ xfs_bmap_add_extent_unwritten_real( goto done; ASSERT(i == 0); cur->bc_rec.b.br_state = XFS_EXT_NORM; - if ((error = xfs_bmbt_insert(cur, &i))) + if ((error = xfs_btree_insert(cur, &i))) goto done; ASSERT(i == 1); } @@ -1742,15 +1742,15 @@ xfs_bmap_add_extent_unwritten_real( PREV.br_blockcount = new->br_startoff - PREV.br_startoff; cur->bc_rec.b = PREV; - if ((error = xfs_bmbt_insert(cur, &i))) + if ((error = xfs_btree_insert(cur, &i))) goto done; ASSERT(i == 1); - if ((error = xfs_bmbt_increment(cur, 0, &i))) + if ((error = xfs_btree_increment(cur, 0, &i))) goto done; ASSERT(i == 1); /* new middle extent - newext */ cur->bc_rec.b = *new; - if ((error = xfs_bmbt_insert(cur, &i))) + if ((error = xfs_btree_insert(cur, &i))) goto done; ASSERT(i == 1); } @@ -2098,10 +2098,10 @@ xfs_bmap_add_extent_hole_real( right.br_blockcount, &i))) goto done; ASSERT(i == 1); - if ((error = xfs_bmbt_delete(cur, &i))) + if ((error = xfs_btree_delete(cur, &i))) goto done; ASSERT(i == 1); - if ((error = xfs_bmbt_decrement(cur, 0, &i))) + if ((error = xfs_btree_decrement(cur, 0, &i))) goto done; ASSERT(i == 1); if ((error = xfs_bmbt_update(cur, left.br_startoff, @@ -2210,7 +2210,7 @@ xfs_bmap_add_extent_hole_real( goto done; ASSERT(i == 0); cur->bc_rec.b.br_state = new->br_state; - if ((error = xfs_bmbt_insert(cur, &i))) + if ((error = xfs_btree_insert(cur, &i))) goto done; ASSERT(i == 1); } @@ -2989,7 +2989,7 @@ xfs_bmap_btree_to_extents( int whichfork) /* data or attr fork */ { /* REFERENCED */ - xfs_bmbt_block_t *cblock;/* child btree block */ + xfs_btree_block_t *cblock;/* child btree block */ xfs_fsblock_t cbno; /* child block number */ xfs_buf_t *cbp; /* child block's buffer */ int error; /* error return value */ @@ -3016,7 +3016,7 @@ xfs_bmap_btree_to_extents( if ((error = xfs_btree_read_bufl(mp, tp, cbno, 0, &cbp, XFS_BMAP_BTREE_REF))) return error; - cblock = XFS_BUF_TO_BMBT_BLOCK(cbp); + cblock = XFS_BUF_TO_BLOCK(cbp); if ((error = xfs_btree_check_lblock(cur, cblock, 0, cbp))) return error; xfs_bmap_add_free(cbno, 1, cur->bc_private.b.flist, mp); @@ -3163,7 +3163,7 @@ xfs_bmap_del_extent( flags |= XFS_ILOG_FEXT(whichfork); break; } - if ((error = xfs_bmbt_delete(cur, &i))) + if ((error = xfs_btree_delete(cur, &i))) goto done; ASSERT(i == 1); break; @@ -3247,10 +3247,10 @@ xfs_bmap_del_extent( got.br_startblock, temp, got.br_state))) goto done; - if ((error = xfs_bmbt_increment(cur, 0, &i))) + if ((error = xfs_btree_increment(cur, 0, &i))) goto done; cur->bc_rec.b = new; - error = xfs_bmbt_insert(cur, &i); + error = xfs_btree_insert(cur, &i); if (error && error != ENOSPC) goto done; /* Index: 2.6.x-xfs-new/fs/xfs/xfs_bmap_btree.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_bmap_btree.c 2007-11-05 10:09:31.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_bmap_btree.c 2007-11-06 19:41:45.344933663 +1100 @@ -35,1466 +35,544 @@ #include "xfs_dinode.h" #include "xfs_inode.h" #include "xfs_inode_item.h" -#include "xfs_alloc.h" #include "xfs_btree.h" #include "xfs_ialloc.h" +#include "xfs_alloc.h" #include "xfs_itable.h" #include "xfs_bmap.h" #include "xfs_error.h" #include "xfs_quota.h" -#if defined(XFS_BMBT_TRACE) -ktrace_t *xfs_bmbt_trace_buf; -#endif - /* - * Prototypes for internal btree functions. + * Determine the extent state. */ - - -STATIC int xfs_bmbt_killroot(xfs_btree_cur_t *); -STATIC void xfs_bmbt_log_keys(xfs_btree_cur_t *, xfs_buf_t *, int, int); -STATIC void xfs_bmbt_log_ptrs(xfs_btree_cur_t *, xfs_buf_t *, int, int); -STATIC int xfs_bmbt_lshift(xfs_btree_cur_t *, int, int *); -STATIC int xfs_bmbt_rshift(xfs_btree_cur_t *, int, int *); -STATIC int xfs_bmbt_split(xfs_btree_cur_t *, int, xfs_fsblock_t *, - __uint64_t *, xfs_btree_cur_t **, int *); -STATIC int xfs_bmbt_updkey(xfs_btree_cur_t *, xfs_bmbt_key_t *, int); - - -#if defined(XFS_BMBT_TRACE) - -static char ARGS[] = "args"; -static char ENTRY[] = "entry"; -static char ERROR[] = "error"; -#undef EXIT -static char EXIT[] = "exit"; +/* ARGSUSED */ +STATIC xfs_exntst_t +xfs_extent_state( + xfs_filblks_t blks, + int extent_flag) +{ + if (extent_flag) { + ASSERT(blks != 0); /* saved for DMIG */ + return XFS_EXT_UNWRITTEN; + } + return XFS_EXT_NORM; +} /* - * Add a trace buffer entry for the arguments given to the routine, - * generic form. + * Convert on-disk form of btree root to in-memory form. */ -STATIC void -xfs_bmbt_trace_enter( - const char *func, - xfs_btree_cur_t *cur, - char *s, - int type, - int line, - __psunsigned_t a0, - __psunsigned_t a1, - __psunsigned_t a2, - __psunsigned_t a3, - __psunsigned_t a4, - __psunsigned_t a5, - __psunsigned_t a6, - __psunsigned_t a7, - __psunsigned_t a8, - __psunsigned_t a9, - __psunsigned_t a10) +void +xfs_bmdr_to_bmbt( + xfs_bmdr_block_t *dblock, + int dblocklen, + xfs_bmbt_block_t *rblock, + int rblocklen) { - xfs_inode_t *ip; - int whichfork; + int dmxr; + xfs_bmbt_key_t *fkp; + __be64 *fpp; + xfs_bmbt_key_t *tkp; + __be64 *tpp; - ip = cur->bc_private.b.ip; - whichfork = cur->bc_private.b.whichfork; - ktrace_enter(xfs_bmbt_trace_buf, - (void *)((__psint_t)type | (whichfork << 8) | (line << 16)), - (void *)func, (void *)s, (void *)ip, (void *)cur, - (void *)a0, (void *)a1, (void *)a2, (void *)a3, - (void *)a4, (void *)a5, (void *)a6, (void *)a7, - (void *)a8, (void *)a9, (void *)a10); - ASSERT(ip->i_btrace); - ktrace_enter(ip->i_btrace, - (void *)((__psint_t)type | (whichfork << 8) | (line << 16)), - (void *)func, (void *)s, (void *)ip, (void *)cur, - (void *)a0, (void *)a1, (void *)a2, (void *)a3, - (void *)a4, (void *)a5, (void *)a6, (void *)a7, - (void *)a8, (void *)a9, (void *)a10); + rblock->bb_magic = cpu_to_be32(XFS_BMAP_MAGIC); + rblock->bb_level = dblock->bb_level; + ASSERT(be16_to_cpu(rblock->bb_level) > 0); + rblock->bb_numrecs = dblock->bb_numrecs; + rblock->bb_leftsib = cpu_to_be64(NULLDFSBNO); + rblock->bb_rightsib = cpu_to_be64(NULLDFSBNO); + dmxr = (int)XFS_BTREE_BLOCK_MAXRECS(dblocklen, xfs_bmdr, 0); + fkp = XFS_BTREE_KEY_ADDR(xfs_bmdr, dblock, 1); + tkp = XFS_BMAP_BROOT_KEY_ADDR(rblock, 1, rblocklen); + fpp = XFS_BTREE_PTR_ADDR(xfs_bmdr, dblock, 1, dmxr); + tpp = XFS_BMAP_BROOT_PTR_ADDR(rblock, 1, rblocklen); + dmxr = be16_to_cpu(dblock->bb_numrecs); + memcpy(tkp, fkp, sizeof(*fkp) * dmxr); + memcpy(tpp, fpp, sizeof(*fpp) * dmxr); } + /* - * Add a trace buffer entry for arguments, for a buffer & 1 integer arg. + * Convert a compressed bmap extent record to an uncompressed form. + * This code must be in sync with the routines xfs_bmbt_get_startoff, + * xfs_bmbt_get_startblock, xfs_bmbt_get_blockcount and xfs_bmbt_get_state. */ -STATIC void -xfs_bmbt_trace_argbi( - const char *func, - xfs_btree_cur_t *cur, - xfs_buf_t *b, - int i, - int line) +STATIC_INLINE void +__xfs_bmbt_get_all( + __uint64_t l0, + __uint64_t l1, + xfs_bmbt_irec_t *s) { - xfs_bmbt_trace_enter(func, cur, ARGS, XFS_BMBT_KTRACE_ARGBI, line, - (__psunsigned_t)b, i, 0, 0, - 0, 0, 0, 0, - 0, 0, 0); + int ext_flag; + xfs_exntst_t st; + + ext_flag = (int)(l0 >> (64 - BMBT_EXNTFLAG_BITLEN)); + s->br_startoff = ((xfs_fileoff_t)l0 & + XFS_MASK64LO(64 - BMBT_EXNTFLAG_BITLEN)) >> 9; +#if XFS_BIG_BLKNOS + s->br_startblock = (((xfs_fsblock_t)l0 & XFS_MASK64LO(9)) << 43) | + (((xfs_fsblock_t)l1) >> 21); +#else +#ifdef DEBUG + { + xfs_dfsbno_t b; + + b = (((xfs_dfsbno_t)l0 & XFS_MASK64LO(9)) << 43) | + (((xfs_dfsbno_t)l1) >> 21); + ASSERT((b >> 32) == 0 || ISNULLDSTARTBLOCK(b)); + s->br_startblock = (xfs_fsblock_t)b; + } +#else /* !DEBUG */ + s->br_startblock = (xfs_fsblock_t)(((xfs_dfsbno_t)l1) >> 21); +#endif /* DEBUG */ +#endif /* XFS_BIG_BLKNOS */ + s->br_blockcount = (xfs_filblks_t)(l1 & XFS_MASK64LO(21)); + /* This is xfs_extent_state() in-line */ + if (ext_flag) { + ASSERT(s->br_blockcount != 0); /* saved for DMIG */ + st = XFS_EXT_UNWRITTEN; + } else + st = XFS_EXT_NORM; + s->br_state = st; } -/* - * Add a trace buffer entry for arguments, for a buffer & 2 integer args. - */ -STATIC void -xfs_bmbt_trace_argbii( - const char *func, - xfs_btree_cur_t *cur, - xfs_buf_t *b, - int i0, - int i1, - int line) +void +xfs_bmbt_get_all( + xfs_bmbt_rec_host_t *r, + xfs_bmbt_irec_t *s) { - xfs_bmbt_trace_enter(func, cur, ARGS, XFS_BMBT_KTRACE_ARGBII, line, - (__psunsigned_t)b, i0, i1, 0, - 0, 0, 0, 0, - 0, 0, 0); + __xfs_bmbt_get_all(r->l0, r->l1, s); } /* - * Add a trace buffer entry for arguments, for 3 block-length args - * and an integer arg. + * Extract the blockcount field from an in memory bmap extent record. */ -STATIC void -xfs_bmbt_trace_argfffi( - const char *func, - xfs_btree_cur_t *cur, - xfs_dfiloff_t o, - xfs_dfsbno_t b, - xfs_dfilblks_t i, - int j, - int line) +xfs_filblks_t +xfs_bmbt_get_blockcount( + xfs_bmbt_rec_host_t *r) { - xfs_bmbt_trace_enter(func, cur, ARGS, XFS_BMBT_KTRACE_ARGFFFI, line, - o >> 32, (int)o, b >> 32, (int)b, - i >> 32, (int)i, (int)j, 0, - 0, 0, 0); + return (xfs_filblks_t)(r->l1 & XFS_MASK64LO(21)); } /* - * Add a trace buffer entry for arguments, for one integer arg. + * Extract the startblock field from an in memory bmap extent record. */ -STATIC void -xfs_bmbt_trace_argi( - const char *func, - xfs_btree_cur_t *cur, - int i, - int line) +xfs_fsblock_t +xfs_bmbt_get_startblock( + xfs_bmbt_rec_host_t *r) { - xfs_bmbt_trace_enter(func, cur, ARGS, XFS_BMBT_KTRACE_ARGI, line, - i, 0, 0, 0, - 0, 0, 0, 0, - 0, 0, 0); +#if XFS_BIG_BLKNOS + return (((xfs_fsblock_t)r->l0 & XFS_MASK64LO(9)) << 43) | + (((xfs_fsblock_t)r->l1) >> 21); +#else +#ifdef DEBUG + xfs_dfsbno_t b; + + b = (((xfs_dfsbno_t)r->l0 & XFS_MASK64LO(9)) << 43) | + (((xfs_dfsbno_t)r->l1) >> 21); + ASSERT((b >> 32) == 0 || ISNULLDSTARTBLOCK(b)); + return (xfs_fsblock_t)b; +#else /* !DEBUG */ + return (xfs_fsblock_t)(((xfs_dfsbno_t)r->l1) >> 21); +#endif /* DEBUG */ +#endif /* XFS_BIG_BLKNOS */ } /* - * Add a trace buffer entry for arguments, for int, fsblock, key. + * Extract the startoff field from an in memory bmap extent record. */ -STATIC void -xfs_bmbt_trace_argifk( - const char *func, - xfs_btree_cur_t *cur, - int i, - xfs_fsblock_t f, - xfs_dfiloff_t o, - int line) +xfs_fileoff_t +xfs_bmbt_get_startoff( + xfs_bmbt_rec_host_t *r) { - xfs_bmbt_trace_enter(func, cur, ARGS, XFS_BMBT_KTRACE_ARGIFK, line, - i, (xfs_dfsbno_t)f >> 32, (int)f, o >> 32, - (int)o, 0, 0, 0, - 0, 0, 0); + return ((xfs_fileoff_t)r->l0 & + XFS_MASK64LO(64 - BMBT_EXNTFLAG_BITLEN)) >> 9; } -/* - * Add a trace buffer entry for arguments, for int, fsblock, rec. - */ -STATIC void -xfs_bmbt_trace_argifr( - const char *func, - xfs_btree_cur_t *cur, - int i, - xfs_fsblock_t f, - xfs_bmbt_rec_t *r, - int line) +xfs_exntst_t +xfs_bmbt_get_state( + xfs_bmbt_rec_host_t *r) { - xfs_dfsbno_t b; - xfs_dfilblks_t c; - xfs_dfsbno_t d; - xfs_dfiloff_t o; - xfs_bmbt_irec_t s; - - d = (xfs_dfsbno_t)f; - xfs_bmbt_disk_get_all(r, &s); - o = (xfs_dfiloff_t)s.br_startoff; - b = (xfs_dfsbno_t)s.br_startblock; - c = s.br_blockcount; - xfs_bmbt_trace_enter(func, cur, ARGS, XFS_BMBT_KTRACE_ARGIFR, line, - i, d >> 32, (int)d, o >> 32, - (int)o, b >> 32, (int)b, c >> 32, - (int)c, 0, 0); + int ext_flag; + + ext_flag = (int)((r->l0) >> (64 - BMBT_EXNTFLAG_BITLEN)); + return xfs_extent_state(xfs_bmbt_get_blockcount(r), + ext_flag); } -/* - * Add a trace buffer entry for arguments, for int, key. - */ -STATIC void -xfs_bmbt_trace_argik( - const char *func, - xfs_btree_cur_t *cur, - int i, - xfs_bmbt_key_t *k, - int line) +/* Endian flipping versions of the bmbt extraction functions */ +void +xfs_bmbt_disk_get_all( + xfs_bmbt_rec_t *r, + xfs_bmbt_irec_t *s) { - xfs_dfiloff_t o; - - o = be64_to_cpu(k->br_startoff); - xfs_bmbt_trace_enter(func, cur, ARGS, XFS_BMBT_KTRACE_ARGIFK, line, - i, o >> 32, (int)o, 0, - 0, 0, 0, 0, - 0, 0, 0); + __xfs_bmbt_get_all(be64_to_cpu(r->l0), be64_to_cpu(r->l1), s); } /* - * Add a trace buffer entry for the cursor/operation. + * Extract the blockcount field from an on disk bmap extent record. */ -STATIC void -xfs_bmbt_trace_cursor( - const char *func, - xfs_btree_cur_t *cur, - char *s, - int line) +xfs_filblks_t +xfs_bmbt_disk_get_blockcount( + xfs_bmbt_rec_t *r) { - xfs_bmbt_rec_host_t r; - - xfs_bmbt_set_all(&r, &cur->bc_rec.b); - xfs_bmbt_trace_enter(func, cur, s, XFS_BMBT_KTRACE_CUR, line, - (cur->bc_nlevels << 24) | (cur->bc_private.b.flags << 16) | - cur->bc_private.b.allocated, - r.l0 >> 32, (int)r.l0, - r.l1 >> 32, (int)r.l1, - (unsigned long)cur->bc_bufs[0], (unsigned long)cur->bc_bufs[1], - (unsigned long)cur->bc_bufs[2], (unsigned long)cur->bc_bufs[3], - (cur->bc_ptrs[0] << 16) | cur->bc_ptrs[1], - (cur->bc_ptrs[2] << 16) | cur->bc_ptrs[3]); -} - -#define XFS_BMBT_TRACE_ARGBI(c,b,i) \ - xfs_bmbt_trace_argbi(__FUNCTION__, c, b, i, __LINE__) -#define XFS_BMBT_TRACE_ARGBII(c,b,i,j) \ - xfs_bmbt_trace_argbii(__FUNCTION__, c, b, i, j, __LINE__) -#define XFS_BMBT_TRACE_ARGFFFI(c,o,b,i,j) \ - xfs_bmbt_trace_argfffi(__FUNCTION__, c, o, b, i, j, __LINE__) -#define XFS_BMBT_TRACE_ARGI(c,i) \ - xfs_bmbt_trace_argi(__FUNCTION__, c, i, __LINE__) -#define XFS_BMBT_TRACE_ARGIFK(c,i,f,s) \ - xfs_bmbt_trace_argifk(__FUNCTION__, c, i, f, s, __LINE__) -#define XFS_BMBT_TRACE_ARGIFR(c,i,f,r) \ - xfs_bmbt_trace_argifr(__FUNCTION__, c, i, f, r, __LINE__) -#define XFS_BMBT_TRACE_ARGIK(c,i,k) \ - xfs_bmbt_trace_argik(__FUNCTION__, c, i, k, __LINE__) -#define XFS_BMBT_TRACE_CURSOR(c,s) \ - xfs_bmbt_trace_cursor(__FUNCTION__, c, s, __LINE__) -#else -#define XFS_BMBT_TRACE_ARGBI(c,b,i) -#define XFS_BMBT_TRACE_ARGBII(c,b,i,j) -#define XFS_BMBT_TRACE_ARGFFFI(c,o,b,i,j) -#define XFS_BMBT_TRACE_ARGI(c,i) -#define XFS_BMBT_TRACE_ARGIFK(c,i,f,s) -#define XFS_BMBT_TRACE_ARGIFR(c,i,f,r) -#define XFS_BMBT_TRACE_ARGIK(c,i,k) -#define XFS_BMBT_TRACE_CURSOR(c,s) -#endif /* XFS_BMBT_TRACE */ - + return (xfs_filblks_t)(be64_to_cpu(r->l1) & XFS_MASK64LO(21)); +} /* - * Internal functions. + * Extract the startoff field from a disk format bmap extent record. */ +xfs_fileoff_t +xfs_bmbt_disk_get_startoff( + xfs_bmbt_rec_t *r) +{ + return ((xfs_fileoff_t)be64_to_cpu(r->l0) & + XFS_MASK64LO(64 - BMBT_EXNTFLAG_BITLEN)) >> 9; +} /* - * Delete record pointed to by cur/level. + * Set all the fields in a bmap extent record from the arguments. */ -STATIC int /* error */ -xfs_bmbt_delrec( - xfs_btree_cur_t *cur, - int level, - int *stat) /* success/failure */ +void +xfs_bmbt_set_allf( + xfs_bmbt_rec_host_t *r, + xfs_fileoff_t startoff, + xfs_fsblock_t startblock, + xfs_filblks_t blockcount, + xfs_exntst_t state) { - xfs_bmbt_block_t *block; /* bmap btree block */ - xfs_fsblock_t bno; /* fs-relative block number */ - xfs_buf_t *bp; /* buffer for block */ - int error; /* error return value */ - int i; /* loop counter */ - int j; /* temp state */ - xfs_bmbt_key_t key; /* bmap btree key */ - xfs_bmbt_key_t *kp=NULL; /* pointer to bmap btree key */ - xfs_fsblock_t lbno; /* left sibling block number */ - xfs_buf_t *lbp; /* left buffer pointer */ - xfs_bmbt_block_t *left; /* left btree block */ - xfs_bmbt_key_t *lkp; /* left btree key */ - xfs_bmbt_ptr_t *lpp; /* left address pointer */ - int lrecs=0; /* left record count */ - xfs_bmbt_rec_t *lrp; /* left record pointer */ - xfs_mount_t *mp; /* file system mount point */ - xfs_bmbt_ptr_t *pp; /* pointer to bmap block addr */ - int ptr; /* key/record index */ - xfs_fsblock_t rbno; /* right sibling block number */ - xfs_buf_t *rbp; /* right buffer pointer */ - xfs_bmbt_block_t *right; /* right btree block */ - xfs_bmbt_key_t *rkp; /* right btree key */ - xfs_bmbt_rec_t *rp; /* pointer to bmap btree rec */ - xfs_bmbt_ptr_t *rpp; /* right address pointer */ - xfs_bmbt_block_t *rrblock; /* right-right btree block */ - xfs_buf_t *rrbp; /* right-right buffer pointer */ - int rrecs=0; /* right record count */ - xfs_bmbt_rec_t *rrp; /* right record pointer */ - xfs_btree_cur_t *tcur; /* temporary btree cursor */ - int numrecs; /* temporary numrec count */ - int numlrecs, numrrecs; - - XFS_BMBT_TRACE_CURSOR(cur, ENTRY); - XFS_BMBT_TRACE_ARGI(cur, level); - ptr = cur->bc_ptrs[level]; - tcur = NULL; - if (ptr == 0) { - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 0; - return 0; - } - block = xfs_bmbt_get_block(cur, level, &bp); - numrecs = be16_to_cpu(block->bb_numrecs); -#ifdef DEBUG - if ((error = xfs_btree_check_lblock(cur, block, level, bp))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - goto error0; - } -#endif - if (ptr > numrecs) { - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 0; - return 0; - } - XFS_STATS_INC(xs_bmbt_delrec); - if (level > 0) { - kp = XFS_BMAP_KEY_IADDR(block, 1, cur); - pp = XFS_BMAP_PTR_IADDR(block, 1, cur); -#ifdef DEBUG - for (i = ptr; i < numrecs; i++) { - if ((error = xfs_btree_check_lptr_disk(cur, pp[i], level))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - goto error0; - } - } -#endif - if (ptr < numrecs) { - memmove(&kp[ptr - 1], &kp[ptr], - (numrecs - ptr) * sizeof(*kp)); - memmove(&pp[ptr - 1], &pp[ptr], - (numrecs - ptr) * sizeof(*pp)); - xfs_bmbt_log_ptrs(cur, bp, ptr, numrecs - 1); - xfs_bmbt_log_keys(cur, bp, ptr, numrecs - 1); - } - } else { - rp = XFS_BMAP_REC_IADDR(block, 1, cur); - if (ptr < numrecs) { - memmove(&rp[ptr - 1], &rp[ptr], - (numrecs - ptr) * sizeof(*rp)); - xfs_bmbt_log_recs(cur, bp, ptr, numrecs - 1); - } - if (ptr == 1) { - key.br_startoff = - cpu_to_be64(xfs_bmbt_disk_get_startoff(rp)); - kp = &key; - } - } - numrecs--; - block->bb_numrecs = cpu_to_be16(numrecs); - xfs_bmbt_log_block(cur, bp, XFS_BB_NUMRECS); - /* - * We're at the root level. - * First, shrink the root block in-memory. - * Try to get rid of the next level down. - * If we can't then there's nothing left to do. - */ - if (level == cur->bc_nlevels - 1) { - xfs_iroot_realloc(cur->bc_private.b.ip, -1, - cur->bc_private.b.whichfork); - if ((error = xfs_bmbt_killroot(cur))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - goto error0; - } - if (level > 0 && (error = xfs_bmbt_decrement(cur, level, &j))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - goto error0; - } - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 1; - return 0; - } - if (ptr == 1 && (error = xfs_bmbt_updkey(cur, kp, level + 1))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - goto error0; - } - if (numrecs >= XFS_BMAP_BLOCK_IMINRECS(level, cur)) { - if (level > 0 && (error = xfs_bmbt_decrement(cur, level, &j))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - goto error0; - } - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 1; - return 0; - } - rbno = be64_to_cpu(block->bb_rightsib); - lbno = be64_to_cpu(block->bb_leftsib); - /* - * One child of root, need to get a chance to copy its contents - * into the root and delete it. Can't go up to next level, - * there's nothing to delete there. - */ - if (lbno == NULLFSBLOCK && rbno == NULLFSBLOCK && - level == cur->bc_nlevels - 2) { - if ((error = xfs_bmbt_killroot(cur))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - goto error0; - } - if (level > 0 && (error = xfs_bmbt_decrement(cur, level, &i))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - goto error0; - } - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 1; - return 0; - } - ASSERT(rbno != NULLFSBLOCK || lbno != NULLFSBLOCK); - if ((error = xfs_btree_dup_cursor(cur, &tcur))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - goto error0; - } - bno = NULLFSBLOCK; - if (rbno != NULLFSBLOCK) { - i = xfs_btree_lastrec(tcur, level); - XFS_WANT_CORRUPTED_GOTO(i == 1, error0); - if ((error = xfs_bmbt_increment(tcur, level, &i))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - goto error0; - } - XFS_WANT_CORRUPTED_GOTO(i == 1, error0); - i = xfs_btree_lastrec(tcur, level); - XFS_WANT_CORRUPTED_GOTO(i == 1, error0); - rbp = tcur->bc_bufs[level]; - right = XFS_BUF_TO_BMBT_BLOCK(rbp); -#ifdef DEBUG - if ((error = xfs_btree_check_lblock(cur, right, level, rbp))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - goto error0; - } -#endif - bno = be64_to_cpu(right->bb_leftsib); - if (be16_to_cpu(right->bb_numrecs) - 1 >= - XFS_BMAP_BLOCK_IMINRECS(level, cur)) { - if ((error = xfs_bmbt_lshift(tcur, level, &i))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - goto error0; - } - if (i) { - ASSERT(be16_to_cpu(block->bb_numrecs) >= - XFS_BMAP_BLOCK_IMINRECS(level, tcur)); - xfs_btree_del_cursor(tcur, XFS_BTREE_NOERROR); - tcur = NULL; - if (level > 0) { - if ((error = xfs_bmbt_decrement(cur, - level, &i))) { - XFS_BMBT_TRACE_CURSOR(cur, - ERROR); - goto error0; - } - } - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 1; - return 0; - } - } - rrecs = be16_to_cpu(right->bb_numrecs); - if (lbno != NULLFSBLOCK) { - i = xfs_btree_firstrec(tcur, level); - XFS_WANT_CORRUPTED_GOTO(i == 1, error0); - if ((error = xfs_bmbt_decrement(tcur, level, &i))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - goto error0; - } - XFS_WANT_CORRUPTED_GOTO(i == 1, error0); - } - } - if (lbno != NULLFSBLOCK) { - i = xfs_btree_firstrec(tcur, level); - XFS_WANT_CORRUPTED_GOTO(i == 1, error0); - /* - * decrement to last in block - */ - if ((error = xfs_bmbt_decrement(tcur, level, &i))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - goto error0; - } - i = xfs_btree_firstrec(tcur, level); - XFS_WANT_CORRUPTED_GOTO(i == 1, error0); - lbp = tcur->bc_bufs[level]; - left = XFS_BUF_TO_BMBT_BLOCK(lbp); -#ifdef DEBUG - if ((error = xfs_btree_check_lblock(cur, left, level, lbp))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - goto error0; - } -#endif - bno = be64_to_cpu(left->bb_rightsib); - if (be16_to_cpu(left->bb_numrecs) - 1 >= - XFS_BMAP_BLOCK_IMINRECS(level, cur)) { - if ((error = xfs_bmbt_rshift(tcur, level, &i))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - goto error0; - } - if (i) { - ASSERT(be16_to_cpu(block->bb_numrecs) >= - XFS_BMAP_BLOCK_IMINRECS(level, tcur)); - xfs_btree_del_cursor(tcur, XFS_BTREE_NOERROR); - tcur = NULL; - if (level == 0) - cur->bc_ptrs[0]++; - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 1; - return 0; - } - } - lrecs = be16_to_cpu(left->bb_numrecs); - } - xfs_btree_del_cursor(tcur, XFS_BTREE_NOERROR); - tcur = NULL; - mp = cur->bc_mp; - ASSERT(bno != NULLFSBLOCK); - if (lbno != NULLFSBLOCK && - lrecs + be16_to_cpu(block->bb_numrecs) <= XFS_BMAP_BLOCK_IMAXRECS(level, cur)) { - rbno = bno; - right = block; - rbp = bp; - if ((error = xfs_btree_read_bufl(mp, cur->bc_tp, lbno, 0, &lbp, - XFS_BMAP_BTREE_REF))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - goto error0; - } - left = XFS_BUF_TO_BMBT_BLOCK(lbp); - if ((error = xfs_btree_check_lblock(cur, left, level, lbp))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - goto error0; - } - } else if (rbno != NULLFSBLOCK && - rrecs + be16_to_cpu(block->bb_numrecs) <= - XFS_BMAP_BLOCK_IMAXRECS(level, cur)) { - lbno = bno; - left = block; - lbp = bp; - if ((error = xfs_btree_read_bufl(mp, cur->bc_tp, rbno, 0, &rbp, - XFS_BMAP_BTREE_REF))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - goto error0; - } - right = XFS_BUF_TO_BMBT_BLOCK(rbp); - if ((error = xfs_btree_check_lblock(cur, right, level, rbp))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - goto error0; - } - lrecs = be16_to_cpu(left->bb_numrecs); - } else { - if (level > 0 && (error = xfs_bmbt_decrement(cur, level, &i))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - goto error0; - } - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 1; - return 0; - } - numlrecs = be16_to_cpu(left->bb_numrecs); - numrrecs = be16_to_cpu(right->bb_numrecs); - if (level > 0) { - lkp = XFS_BMAP_KEY_IADDR(left, numlrecs + 1, cur); - lpp = XFS_BMAP_PTR_IADDR(left, numlrecs + 1, cur); - rkp = XFS_BMAP_KEY_IADDR(right, 1, cur); - rpp = XFS_BMAP_PTR_IADDR(right, 1, cur); -#ifdef DEBUG - for (i = 0; i < numrrecs; i++) { - if ((error = xfs_btree_check_lptr_disk(cur, rpp[i], level))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - goto error0; - } - } -#endif - memcpy(lkp, rkp, numrrecs * sizeof(*lkp)); - memcpy(lpp, rpp, numrrecs * sizeof(*lpp)); - xfs_bmbt_log_keys(cur, lbp, numlrecs + 1, numlrecs + numrrecs); - xfs_bmbt_log_ptrs(cur, lbp, numlrecs + 1, numlrecs + numrrecs); + int extent_flag = (state == XFS_EXT_NORM) ? 0 : 1; + + ASSERT(state == XFS_EXT_NORM || state == XFS_EXT_UNWRITTEN); + ASSERT((startoff & XFS_MASK64HI(64-BMBT_STARTOFF_BITLEN)) == 0); + ASSERT((blockcount & XFS_MASK64HI(64-BMBT_BLOCKCOUNT_BITLEN)) == 0); + +#if XFS_BIG_BLKNOS + ASSERT((startblock & XFS_MASK64HI(64-BMBT_STARTBLOCK_BITLEN)) == 0); + + r->l0 = ((xfs_bmbt_rec_base_t)extent_flag << 63) | + ((xfs_bmbt_rec_base_t)startoff << 9) | + ((xfs_bmbt_rec_base_t)startblock >> 43); + r->l1 = ((xfs_bmbt_rec_base_t)startblock << 21) | + ((xfs_bmbt_rec_base_t)blockcount & + (xfs_bmbt_rec_base_t)XFS_MASK64LO(21)); +#else /* !XFS_BIG_BLKNOS */ + if (ISNULLSTARTBLOCK(startblock)) { + r->l0 = ((xfs_bmbt_rec_base_t)extent_flag << 63) | + ((xfs_bmbt_rec_base_t)startoff << 9) | + (xfs_bmbt_rec_base_t)XFS_MASK64LO(9); + r->l1 = XFS_MASK64HI(11) | + ((xfs_bmbt_rec_base_t)startblock << 21) | + ((xfs_bmbt_rec_base_t)blockcount & + (xfs_bmbt_rec_base_t)XFS_MASK64LO(21)); } else { - lrp = XFS_BMAP_REC_IADDR(left, numlrecs + 1, cur); - rrp = XFS_BMAP_REC_IADDR(right, 1, cur); - memcpy(lrp, rrp, numrrecs * sizeof(*lrp)); - xfs_bmbt_log_recs(cur, lbp, numlrecs + 1, numlrecs + numrrecs); - } - be16_add(&left->bb_numrecs, numrrecs); - left->bb_rightsib = right->bb_rightsib; - xfs_bmbt_log_block(cur, lbp, XFS_BB_RIGHTSIB | XFS_BB_NUMRECS); - if (be64_to_cpu(left->bb_rightsib) != NULLDFSBNO) { - if ((error = xfs_btree_read_bufl(mp, cur->bc_tp, - be64_to_cpu(left->bb_rightsib), - 0, &rrbp, XFS_BMAP_BTREE_REF))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - goto error0; - } - rrblock = XFS_BUF_TO_BMBT_BLOCK(rrbp); - if ((error = xfs_btree_check_lblock(cur, rrblock, level, rrbp))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - goto error0; - } - rrblock->bb_leftsib = cpu_to_be64(lbno); - xfs_bmbt_log_block(cur, rrbp, XFS_BB_LEFTSIB); - } - xfs_bmap_add_free(XFS_DADDR_TO_FSB(mp, XFS_BUF_ADDR(rbp)), 1, - cur->bc_private.b.flist, mp); - cur->bc_private.b.ip->i_d.di_nblocks--; - xfs_trans_log_inode(cur->bc_tp, cur->bc_private.b.ip, XFS_ILOG_CORE); - XFS_TRANS_MOD_DQUOT_BYINO(mp, cur->bc_tp, cur->bc_private.b.ip, - XFS_TRANS_DQ_BCOUNT, -1L); - xfs_trans_binval(cur->bc_tp, rbp); - if (bp != lbp) { - cur->bc_bufs[level] = lbp; - cur->bc_ptrs[level] += lrecs; - cur->bc_ra[level] = 0; - } else if ((error = xfs_bmbt_increment(cur, level + 1, &i))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - goto error0; + r->l0 = ((xfs_bmbt_rec_base_t)extent_flag << 63) | + ((xfs_bmbt_rec_base_t)startoff << 9); + r->l1 = ((xfs_bmbt_rec_base_t)startblock << 21) | + ((xfs_bmbt_rec_base_t)blockcount & + (xfs_bmbt_rec_base_t)XFS_MASK64LO(21)); } - if (level > 0) - cur->bc_ptrs[level]--; - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 2; - return 0; - -error0: - if (tcur) - xfs_btree_del_cursor(tcur, XFS_BTREE_ERROR); - return error; +#endif /* XFS_BIG_BLKNOS */ } /* - * Insert one record/level. Return information to the caller - * allowing the next level up to proceed if necessary. + * Set all the fields in a bmap extent record from the uncompressed form. */ -STATIC int /* error */ -xfs_bmbt_insrec( - xfs_btree_cur_t *cur, - int level, - xfs_fsblock_t *bnop, - xfs_bmbt_rec_t *recp, - xfs_btree_cur_t **curp, - int *stat) /* no-go/done/continue */ +void +xfs_bmbt_set_all( + xfs_bmbt_rec_host_t *r, + xfs_bmbt_irec_t *s) { - xfs_bmbt_block_t *block; /* bmap btree block */ - xfs_buf_t *bp; /* buffer for block */ - int error; /* error return value */ - int i; /* loop index */ - xfs_bmbt_key_t key; /* bmap btree key */ - xfs_bmbt_key_t *kp=NULL; /* pointer to bmap btree key */ - int logflags; /* inode logging flags */ - xfs_fsblock_t nbno; /* new block number */ - struct xfs_btree_cur *ncur; /* new btree cursor */ - __uint64_t startoff; /* new btree key value */ - xfs_bmbt_rec_t nrec; /* new record count */ - int optr; /* old key/record index */ - xfs_bmbt_ptr_t *pp; /* pointer to bmap block addr */ - int ptr; /* key/record index */ - xfs_bmbt_rec_t *rp=NULL; /* pointer to bmap btree rec */ - int numrecs; - - ASSERT(level < cur->bc_nlevels); - XFS_BMBT_TRACE_CURSOR(cur, ENTRY); - XFS_BMBT_TRACE_ARGIFR(cur, level, *bnop, recp); - ncur = NULL; - key.br_startoff = cpu_to_be64(xfs_bmbt_disk_get_startoff(recp)); - optr = ptr = cur->bc_ptrs[level]; - if (ptr == 0) { - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 0; - return 0; - } - XFS_STATS_INC(xs_bmbt_insrec); - block = xfs_bmbt_get_block(cur, level, &bp); - numrecs = be16_to_cpu(block->bb_numrecs); -#ifdef DEBUG - if ((error = xfs_btree_check_lblock(cur, block, level, bp))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - if (ptr <= numrecs) { - if (level == 0) { - rp = XFS_BMAP_REC_IADDR(block, ptr, cur); - xfs_btree_check_rec(XFS_BTNUM_BMAP, recp, rp); - } else { - kp = XFS_BMAP_KEY_IADDR(block, ptr, cur); - xfs_btree_check_key(XFS_BTNUM_BMAP, &key, kp); - } - } -#endif - nbno = NULLFSBLOCK; - if (numrecs == XFS_BMAP_BLOCK_IMAXRECS(level, cur)) { - if (numrecs < XFS_BMAP_BLOCK_DMAXRECS(level, cur)) { - /* - * A root block, that can be made bigger. - */ - xfs_iroot_realloc(cur->bc_private.b.ip, 1, - cur->bc_private.b.whichfork); - block = xfs_bmbt_get_block(cur, level, &bp); - } else if (level == cur->bc_nlevels - 1) { - if ((error = xfs_bmbt_newroot(cur, &logflags, stat)) || - *stat == 0) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - xfs_trans_log_inode(cur->bc_tp, cur->bc_private.b.ip, - logflags); - block = xfs_bmbt_get_block(cur, level, &bp); - } else { - if ((error = xfs_bmbt_rshift(cur, level, &i))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - if (i) { - /* nothing */ - } else { - if ((error = xfs_bmbt_lshift(cur, level, &i))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - if (i) { - optr = ptr = cur->bc_ptrs[level]; - } else { - if ((error = xfs_bmbt_split(cur, level, - &nbno, &startoff, &ncur, - &i))) { - XFS_BMBT_TRACE_CURSOR(cur, - ERROR); - return error; - } - if (i) { - block = xfs_bmbt_get_block( - cur, level, &bp); -#ifdef DEBUG - if ((error = - xfs_btree_check_lblock(cur, - block, level, bp))) { - XFS_BMBT_TRACE_CURSOR( - cur, ERROR); - return error; - } -#endif - ptr = cur->bc_ptrs[level]; - xfs_bmbt_disk_set_allf(&nrec, - startoff, 0, 0, - XFS_EXT_NORM); - } else { - XFS_BMBT_TRACE_CURSOR(cur, - EXIT); - *stat = 0; - return 0; - } - } - } - } - } - numrecs = be16_to_cpu(block->bb_numrecs); - if (level > 0) { - kp = XFS_BMAP_KEY_IADDR(block, 1, cur); - pp = XFS_BMAP_PTR_IADDR(block, 1, cur); -#ifdef DEBUG - for (i = numrecs; i >= ptr; i--) { - if ((error = xfs_btree_check_lptr_disk(cur, pp[i - 1], - level))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - } -#endif - memmove(&kp[ptr], &kp[ptr - 1], - (numrecs - ptr + 1) * sizeof(*kp)); - memmove(&pp[ptr], &pp[ptr - 1], - (numrecs - ptr + 1) * sizeof(*pp)); -#ifdef DEBUG - if ((error = xfs_btree_check_lptr(cur, *bnop, level))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } -#endif - kp[ptr - 1] = key; - pp[ptr - 1] = cpu_to_be64(*bnop); - numrecs++; - block->bb_numrecs = cpu_to_be16(numrecs); - xfs_bmbt_log_keys(cur, bp, ptr, numrecs); - xfs_bmbt_log_ptrs(cur, bp, ptr, numrecs); - } else { - rp = XFS_BMAP_REC_IADDR(block, 1, cur); - memmove(&rp[ptr], &rp[ptr - 1], - (numrecs - ptr + 1) * sizeof(*rp)); - rp[ptr - 1] = *recp; - numrecs++; - block->bb_numrecs = cpu_to_be16(numrecs); - xfs_bmbt_log_recs(cur, bp, ptr, numrecs); - } - xfs_bmbt_log_block(cur, bp, XFS_BB_NUMRECS); -#ifdef DEBUG - if (ptr < numrecs) { - if (level == 0) - xfs_btree_check_rec(XFS_BTNUM_BMAP, rp + ptr - 1, - rp + ptr); - else - xfs_btree_check_key(XFS_BTNUM_BMAP, kp + ptr - 1, - kp + ptr); - } -#endif - if (optr == 1 && (error = xfs_bmbt_updkey(cur, &key, level + 1))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - *bnop = nbno; - if (nbno != NULLFSBLOCK) { - *recp = nrec; - *curp = ncur; - } - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 1; - return 0; + xfs_bmbt_set_allf(r, s->br_startoff, s->br_startblock, + s->br_blockcount, s->br_state); } -STATIC int -xfs_bmbt_killroot( - xfs_btree_cur_t *cur) + +/* + * Set all the fields in a disk format bmap extent record from the arguments. + */ +void +xfs_bmbt_disk_set_allf( + xfs_bmbt_rec_t *r, + xfs_fileoff_t startoff, + xfs_fsblock_t startblock, + xfs_filblks_t blockcount, + xfs_exntst_t state) { - xfs_bmbt_block_t *block; - xfs_bmbt_block_t *cblock; - xfs_buf_t *cbp; - xfs_bmbt_key_t *ckp; - xfs_bmbt_ptr_t *cpp; -#ifdef DEBUG - int error; -#endif - int i; - xfs_bmbt_key_t *kp; - xfs_inode_t *ip; - xfs_ifork_t *ifp; - int level; - xfs_bmbt_ptr_t *pp; + int extent_flag = (state == XFS_EXT_NORM) ? 0 : 1; - XFS_BMBT_TRACE_CURSOR(cur, ENTRY); - level = cur->bc_nlevels - 1; - ASSERT(level >= 1); - /* - * Don't deal with the root block needs to be a leaf case. - * We're just going to turn the thing back into extents anyway. - */ - if (level == 1) { - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - return 0; - } - block = xfs_bmbt_get_block(cur, level, &cbp); - /* - * Give up if the root has multiple children. - */ - if (be16_to_cpu(block->bb_numrecs) != 1) { - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - return 0; - } - /* - * Only do this if the next level will fit. - * Then the data must be copied up to the inode, - * instead of freeing the root you free the next level. - */ - cbp = cur->bc_bufs[level - 1]; - cblock = XFS_BUF_TO_BMBT_BLOCK(cbp); - if (be16_to_cpu(cblock->bb_numrecs) > XFS_BMAP_BLOCK_DMAXRECS(level, cur)) { - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - return 0; - } - ASSERT(be64_to_cpu(cblock->bb_leftsib) == NULLDFSBNO); - ASSERT(be64_to_cpu(cblock->bb_rightsib) == NULLDFSBNO); - ip = cur->bc_private.b.ip; - ifp = XFS_IFORK_PTR(ip, cur->bc_private.b.whichfork); - ASSERT(XFS_BMAP_BLOCK_IMAXRECS(level, cur) == - XFS_BMAP_BROOT_MAXRECS(ifp->if_broot_bytes)); - i = (int)(be16_to_cpu(cblock->bb_numrecs) - XFS_BMAP_BLOCK_IMAXRECS(level, cur)); - if (i) { - xfs_iroot_realloc(ip, i, cur->bc_private.b.whichfork); - block = ifp->if_broot; - } - be16_add(&block->bb_numrecs, i); - ASSERT(block->bb_numrecs == cblock->bb_numrecs); - kp = XFS_BMAP_KEY_IADDR(block, 1, cur); - ckp = XFS_BMAP_KEY_IADDR(cblock, 1, cur); - memcpy(kp, ckp, be16_to_cpu(block->bb_numrecs) * sizeof(*kp)); - pp = XFS_BMAP_PTR_IADDR(block, 1, cur); - cpp = XFS_BMAP_PTR_IADDR(cblock, 1, cur); -#ifdef DEBUG - for (i = 0; i < be16_to_cpu(cblock->bb_numrecs); i++) { - if ((error = xfs_btree_check_lptr_disk(cur, cpp[i], level - 1))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } + ASSERT(state == XFS_EXT_NORM || state == XFS_EXT_UNWRITTEN); + ASSERT((startoff & XFS_MASK64HI(64-BMBT_STARTOFF_BITLEN)) == 0); + ASSERT((blockcount & XFS_MASK64HI(64-BMBT_BLOCKCOUNT_BITLEN)) == 0); + +#if XFS_BIG_BLKNOS + ASSERT((startblock & XFS_MASK64HI(64-BMBT_STARTBLOCK_BITLEN)) == 0); + + r->l0 = cpu_to_be64( + ((xfs_bmbt_rec_base_t)extent_flag << 63) | + ((xfs_bmbt_rec_base_t)startoff << 9) | + ((xfs_bmbt_rec_base_t)startblock >> 43)); + r->l1 = cpu_to_be64( + ((xfs_bmbt_rec_base_t)startblock << 21) | + ((xfs_bmbt_rec_base_t)blockcount & + (xfs_bmbt_rec_base_t)XFS_MASK64LO(21))); +#else /* !XFS_BIG_BLKNOS */ + if (ISNULLSTARTBLOCK(startblock)) { + r->l0 = cpu_to_be64( + ((xfs_bmbt_rec_base_t)extent_flag << 63) | + ((xfs_bmbt_rec_base_t)startoff << 9) | + (xfs_bmbt_rec_base_t)XFS_MASK64LO(9)); + r->l1 = cpu_to_be64(XFS_MASK64HI(11) | + ((xfs_bmbt_rec_base_t)startblock << 21) | + ((xfs_bmbt_rec_base_t)blockcount & + (xfs_bmbt_rec_base_t)XFS_MASK64LO(21))); + } else { + r->l0 = cpu_to_be64( + ((xfs_bmbt_rec_base_t)extent_flag << 63) | + ((xfs_bmbt_rec_base_t)startoff << 9)); + r->l1 = cpu_to_be64( + ((xfs_bmbt_rec_base_t)startblock << 21) | + ((xfs_bmbt_rec_base_t)blockcount & + (xfs_bmbt_rec_base_t)XFS_MASK64LO(21))); } -#endif - memcpy(pp, cpp, be16_to_cpu(block->bb_numrecs) * sizeof(*pp)); - xfs_bmap_add_free(XFS_DADDR_TO_FSB(cur->bc_mp, XFS_BUF_ADDR(cbp)), 1, - cur->bc_private.b.flist, cur->bc_mp); - ip->i_d.di_nblocks--; - XFS_TRANS_MOD_DQUOT_BYINO(cur->bc_mp, cur->bc_tp, ip, - XFS_TRANS_DQ_BCOUNT, -1L); - xfs_trans_binval(cur->bc_tp, cbp); - cur->bc_bufs[level - 1] = NULL; - be16_add(&block->bb_level, -1); - xfs_trans_log_inode(cur->bc_tp, ip, - XFS_ILOG_CORE | XFS_ILOG_FBROOT(cur->bc_private.b.whichfork)); - cur->bc_nlevels--; - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - return 0; +#endif /* XFS_BIG_BLKNOS */ } /* - * Log key values from the btree block. + * Set all the fields in a bmap extent record from the uncompressed form. */ -STATIC void -xfs_bmbt_log_keys( - xfs_btree_cur_t *cur, - xfs_buf_t *bp, - int kfirst, - int klast) +void +xfs_bmbt_disk_set_all( + xfs_bmbt_rec_t *r, + xfs_bmbt_irec_t *s) { - xfs_trans_t *tp; + xfs_bmbt_disk_set_allf(r, s->br_startoff, s->br_startblock, + s->br_blockcount, s->br_state); +} - XFS_BMBT_TRACE_CURSOR(cur, ENTRY); - XFS_BMBT_TRACE_ARGBII(cur, bp, kfirst, klast); - tp = cur->bc_tp; - if (bp) { - xfs_bmbt_block_t *block; - int first; - xfs_bmbt_key_t *kp; - int last; +/* + * Set the blockcount field in a bmap extent record. + */ +void +xfs_bmbt_set_blockcount( + xfs_bmbt_rec_host_t *r, + xfs_filblks_t v) +{ + ASSERT((v & XFS_MASK64HI(43)) == 0); + r->l1 = (r->l1 & (xfs_bmbt_rec_base_t)XFS_MASK64HI(43)) | + (xfs_bmbt_rec_base_t)(v & XFS_MASK64LO(21)); +} - block = XFS_BUF_TO_BMBT_BLOCK(bp); - kp = XFS_BMAP_KEY_DADDR(block, 1, cur); - first = (int)((xfs_caddr_t)&kp[kfirst - 1] - (xfs_caddr_t)block); - last = (int)(((xfs_caddr_t)&kp[klast] - 1) - (xfs_caddr_t)block); - xfs_trans_log_buf(tp, bp, first, last); +/* + * Set the startblock field in a bmap extent record. + */ +void +xfs_bmbt_set_startblock( + xfs_bmbt_rec_host_t *r, + xfs_fsblock_t v) +{ +#if XFS_BIG_BLKNOS + ASSERT((v & XFS_MASK64HI(12)) == 0); + r->l0 = (r->l0 & (xfs_bmbt_rec_base_t)XFS_MASK64HI(55)) | + (xfs_bmbt_rec_base_t)(v >> 43); + r->l1 = (r->l1 & (xfs_bmbt_rec_base_t)XFS_MASK64LO(21)) | + (xfs_bmbt_rec_base_t)(v << 21); +#else /* !XFS_BIG_BLKNOS */ + if (ISNULLSTARTBLOCK(v)) { + r->l0 |= (xfs_bmbt_rec_base_t)XFS_MASK64LO(9); + r->l1 = (xfs_bmbt_rec_base_t)XFS_MASK64HI(11) | + ((xfs_bmbt_rec_base_t)v << 21) | + (r->l1 & (xfs_bmbt_rec_base_t)XFS_MASK64LO(21)); } else { - xfs_inode_t *ip; - - ip = cur->bc_private.b.ip; - xfs_trans_log_inode(tp, ip, - XFS_ILOG_FBROOT(cur->bc_private.b.whichfork)); + r->l0 &= ~(xfs_bmbt_rec_base_t)XFS_MASK64LO(9); + r->l1 = ((xfs_bmbt_rec_base_t)v << 21) | + (r->l1 & (xfs_bmbt_rec_base_t)XFS_MASK64LO(21)); } - XFS_BMBT_TRACE_CURSOR(cur, EXIT); +#endif /* XFS_BIG_BLKNOS */ } /* - * Log pointer values from the btree block. + * Set the startoff field in a bmap extent record. */ -STATIC void -xfs_bmbt_log_ptrs( - xfs_btree_cur_t *cur, - xfs_buf_t *bp, - int pfirst, - int plast) +void +xfs_bmbt_set_startoff( + xfs_bmbt_rec_host_t *r, + xfs_fileoff_t v) { - xfs_trans_t *tp; + ASSERT((v & XFS_MASK64HI(9)) == 0); + r->l0 = (r->l0 & (xfs_bmbt_rec_base_t) XFS_MASK64HI(1)) | + ((xfs_bmbt_rec_base_t)v << 9) | + (r->l0 & (xfs_bmbt_rec_base_t)XFS_MASK64LO(9)); +} - XFS_BMBT_TRACE_CURSOR(cur, ENTRY); - XFS_BMBT_TRACE_ARGBII(cur, bp, pfirst, plast); - tp = cur->bc_tp; - if (bp) { - xfs_bmbt_block_t *block; - int first; - int last; - xfs_bmbt_ptr_t *pp; +/* + * Set the extent state field in a bmap extent record. + */ +void +xfs_bmbt_set_state( + xfs_bmbt_rec_host_t *r, + xfs_exntst_t v) +{ + ASSERT(v == XFS_EXT_NORM || v == XFS_EXT_UNWRITTEN); + if (v == XFS_EXT_NORM) + r->l0 &= XFS_MASK64LO(64 - BMBT_EXNTFLAG_BITLEN); + else + r->l0 |= XFS_MASK64HI(BMBT_EXNTFLAG_BITLEN); +} - block = XFS_BUF_TO_BMBT_BLOCK(bp); - pp = XFS_BMAP_PTR_DADDR(block, 1, cur); - first = (int)((xfs_caddr_t)&pp[pfirst - 1] - (xfs_caddr_t)block); - last = (int)(((xfs_caddr_t)&pp[plast] - 1) - (xfs_caddr_t)block); - xfs_trans_log_buf(tp, bp, first, last); - } else { - xfs_inode_t *ip; +/* + * Convert in-memory form of btree root to on-disk form. + */ +void +xfs_bmbt_to_bmdr( + xfs_bmbt_block_t *rblock, + int rblocklen, + xfs_bmdr_block_t *dblock, + int dblocklen) +{ + int dmxr; + xfs_bmbt_key_t *fkp; + __be64 *fpp; + xfs_bmbt_key_t *tkp; + __be64 *tpp; - ip = cur->bc_private.b.ip; - xfs_trans_log_inode(tp, ip, - XFS_ILOG_FBROOT(cur->bc_private.b.whichfork)); - } - XFS_BMBT_TRACE_CURSOR(cur, EXIT); + ASSERT(be32_to_cpu(rblock->bb_magic) == XFS_BMAP_MAGIC); + ASSERT(be64_to_cpu(rblock->bb_leftsib) == NULLDFSBNO); + ASSERT(be64_to_cpu(rblock->bb_rightsib) == NULLDFSBNO); + ASSERT(be16_to_cpu(rblock->bb_level) > 0); + dblock->bb_level = rblock->bb_level; + dblock->bb_numrecs = rblock->bb_numrecs; + dmxr = (int)XFS_BTREE_BLOCK_MAXRECS(dblocklen, xfs_bmdr, 0); + fkp = XFS_BMAP_BROOT_KEY_ADDR(rblock, 1, rblocklen); + tkp = XFS_BTREE_KEY_ADDR(xfs_bmdr, dblock, 1); + fpp = XFS_BMAP_BROOT_PTR_ADDR(rblock, 1, rblocklen); + tpp = XFS_BTREE_PTR_ADDR(xfs_bmdr, dblock, 1, dmxr); + dmxr = be16_to_cpu(dblock->bb_numrecs); + memcpy(tkp, fkp, sizeof(*fkp) * dmxr); + memcpy(tpp, fpp, sizeof(*fpp) * dmxr); } /* - * Lookup the record. The cursor is made to point to it, based on dir. + * Check extent records, which have just been read, for + * any bit in the extent flag field. ASSERT on debug + * kernels, as this condition should not occur. + * Return an error condition (1) if any flags found, + * otherwise return 0. */ -STATIC int /* error */ -xfs_bmbt_lookup( - xfs_btree_cur_t *cur, - xfs_lookup_t dir, - int *stat) /* success/failure */ -{ - xfs_bmbt_block_t *block=NULL; - xfs_buf_t *bp; - xfs_daddr_t d; - xfs_sfiloff_t diff; - int error; /* error return value */ - xfs_fsblock_t fsbno=0; - int high; - int i; - int keyno=0; - xfs_bmbt_key_t *kkbase=NULL; - xfs_bmbt_key_t *kkp; - xfs_bmbt_rec_t *krbase=NULL; - xfs_bmbt_rec_t *krp; - int level; - int low; - xfs_mount_t *mp; - xfs_bmbt_ptr_t *pp; - xfs_bmbt_irec_t *rp; - xfs_fileoff_t startoff; - xfs_trans_t *tp; - - XFS_STATS_INC(xs_bmbt_lookup); - XFS_BMBT_TRACE_CURSOR(cur, ENTRY); - XFS_BMBT_TRACE_ARGI(cur, (int)dir); - tp = cur->bc_tp; - mp = cur->bc_mp; - rp = &cur->bc_rec.b; - for (level = cur->bc_nlevels - 1, diff = 1; level >= 0; level--) { - if (level < cur->bc_nlevels - 1) { - d = XFS_FSB_TO_DADDR(mp, fsbno); - bp = cur->bc_bufs[level]; - if (bp && XFS_BUF_ADDR(bp) != d) - bp = NULL; - if (!bp) { - if ((error = xfs_btree_read_bufl(mp, tp, fsbno, - 0, &bp, XFS_BMAP_BTREE_REF))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - xfs_btree_setbuf(cur, level, bp); - block = XFS_BUF_TO_BMBT_BLOCK(bp); - if ((error = xfs_btree_check_lblock(cur, block, - level, bp))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - } else - block = XFS_BUF_TO_BMBT_BLOCK(bp); - } else - block = xfs_bmbt_get_block(cur, level, &bp); - if (diff == 0) - keyno = 1; - else { - if (level > 0) - kkbase = XFS_BMAP_KEY_IADDR(block, 1, cur); - else - krbase = XFS_BMAP_REC_IADDR(block, 1, cur); - low = 1; - if (!(high = be16_to_cpu(block->bb_numrecs))) { - ASSERT(level == 0); - cur->bc_ptrs[0] = dir != XFS_LOOKUP_LE; - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 0; - return 0; - } - while (low <= high) { - XFS_STATS_INC(xs_bmbt_compare); - keyno = (low + high) >> 1; - if (level > 0) { - kkp = kkbase + keyno - 1; - startoff = be64_to_cpu(kkp->br_startoff); - } else { - krp = krbase + keyno - 1; - startoff = xfs_bmbt_disk_get_startoff(krp); - } - diff = (xfs_sfiloff_t) - (startoff - rp->br_startoff); - if (diff < 0) - low = keyno + 1; - else if (diff > 0) - high = keyno - 1; - else - break; - } - } - if (level > 0) { - if (diff > 0 && --keyno < 1) - keyno = 1; - pp = XFS_BMAP_PTR_IADDR(block, keyno, cur); - fsbno = be64_to_cpu(*pp); -#ifdef DEBUG - if ((error = xfs_btree_check_lptr(cur, fsbno, level))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } -#endif - cur->bc_ptrs[level] = keyno; - } - } - if (dir != XFS_LOOKUP_LE && diff < 0) { - keyno++; - /* - * If ge search and we went off the end of the block, but it's - * not the last block, we're in the wrong block. - */ - if (dir == XFS_LOOKUP_GE && keyno > be16_to_cpu(block->bb_numrecs) && - be64_to_cpu(block->bb_rightsib) != NULLDFSBNO) { - cur->bc_ptrs[0] = keyno; - if ((error = xfs_bmbt_increment(cur, 0, &i))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - XFS_WANT_CORRUPTED_RETURN(i == 1); - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 1; - return 0; + +int +xfs_check_nostate_extents( + xfs_ifork_t *ifp, + xfs_extnum_t idx, + xfs_extnum_t num) +{ + for (; num > 0; num--, idx++) { + xfs_bmbt_rec_host_t *ep = xfs_iext_get_ext(ifp, idx); + if ((ep->l0 >> + (64 - BMBT_EXNTFLAG_BITLEN)) != 0) { + ASSERT(0); + return 1; } } - else if (dir == XFS_LOOKUP_LE && diff > 0) - keyno--; - cur->bc_ptrs[0] = keyno; - if (keyno == 0 || keyno > be16_to_cpu(block->bb_numrecs)) { - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 0; - } else { - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = ((dir != XFS_LOOKUP_EQ) || (diff == 0)); - } return 0; } /* - * Move 1 record left from cur/level if possible. - * Update cur to reflect the new path. + * BMBT function vectors for core btree operations */ -STATIC int /* error */ -xfs_bmbt_lshift( - xfs_btree_cur_t *cur, - int level, - int *stat) /* success/failure */ -{ - int error; /* error return value */ -#ifdef DEBUG - int i; /* loop counter */ -#endif - xfs_bmbt_key_t key; /* bmap btree key */ - xfs_buf_t *lbp; /* left buffer pointer */ - xfs_bmbt_block_t *left; /* left btree block */ - xfs_bmbt_key_t *lkp=NULL; /* left btree key */ - xfs_bmbt_ptr_t *lpp; /* left address pointer */ - int lrecs; /* left record count */ - xfs_bmbt_rec_t *lrp=NULL; /* left record pointer */ - xfs_mount_t *mp; /* file system mount point */ - xfs_buf_t *rbp; /* right buffer pointer */ - xfs_bmbt_block_t *right; /* right btree block */ - xfs_bmbt_key_t *rkp=NULL; /* right btree key */ - xfs_bmbt_ptr_t *rpp=NULL; /* right address pointer */ - xfs_bmbt_rec_t *rrp=NULL; /* right record pointer */ - int rrecs; /* right record count */ - - XFS_BMBT_TRACE_CURSOR(cur, ENTRY); - XFS_BMBT_TRACE_ARGI(cur, level); - if (level == cur->bc_nlevels - 1) { - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 0; - return 0; - } - rbp = cur->bc_bufs[level]; - right = XFS_BUF_TO_BMBT_BLOCK(rbp); -#ifdef DEBUG - if ((error = xfs_btree_check_lblock(cur, right, level, rbp))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } -#endif - if (be64_to_cpu(right->bb_leftsib) == NULLDFSBNO) { - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 0; - return 0; - } - if (cur->bc_ptrs[level] <= 1) { - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 0; - return 0; - } - mp = cur->bc_mp; - if ((error = xfs_btree_read_bufl(mp, cur->bc_tp, be64_to_cpu(right->bb_leftsib), 0, - &lbp, XFS_BMAP_BTREE_REF))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - left = XFS_BUF_TO_BMBT_BLOCK(lbp); - if ((error = xfs_btree_check_lblock(cur, left, level, lbp))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - if (be16_to_cpu(left->bb_numrecs) == XFS_BMAP_BLOCK_IMAXRECS(level, cur)) { - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 0; - return 0; - } - lrecs = be16_to_cpu(left->bb_numrecs) + 1; - if (level > 0) { - lkp = XFS_BMAP_KEY_IADDR(left, lrecs, cur); - rkp = XFS_BMAP_KEY_IADDR(right, 1, cur); - *lkp = *rkp; - xfs_bmbt_log_keys(cur, lbp, lrecs, lrecs); - lpp = XFS_BMAP_PTR_IADDR(left, lrecs, cur); - rpp = XFS_BMAP_PTR_IADDR(right, 1, cur); -#ifdef DEBUG - if ((error = xfs_btree_check_lptr_disk(cur, *rpp, level))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } -#endif - *lpp = *rpp; - xfs_bmbt_log_ptrs(cur, lbp, lrecs, lrecs); - } else { - lrp = XFS_BMAP_REC_IADDR(left, lrecs, cur); - rrp = XFS_BMAP_REC_IADDR(right, 1, cur); - *lrp = *rrp; - xfs_bmbt_log_recs(cur, lbp, lrecs, lrecs); - } - left->bb_numrecs = cpu_to_be16(lrecs); - xfs_bmbt_log_block(cur, lbp, XFS_BB_NUMRECS); -#ifdef DEBUG - if (level > 0) - xfs_btree_check_key(XFS_BTNUM_BMAP, lkp - 1, lkp); - else - xfs_btree_check_rec(XFS_BTNUM_BMAP, lrp - 1, lrp); -#endif - rrecs = be16_to_cpu(right->bb_numrecs) - 1; - right->bb_numrecs = cpu_to_be16(rrecs); - xfs_bmbt_log_block(cur, rbp, XFS_BB_NUMRECS); - if (level > 0) { -#ifdef DEBUG - for (i = 0; i < rrecs; i++) { - if ((error = xfs_btree_check_lptr_disk(cur, rpp[i + 1], - level))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - } -#endif - memmove(rkp, rkp + 1, rrecs * sizeof(*rkp)); - memmove(rpp, rpp + 1, rrecs * sizeof(*rpp)); - xfs_bmbt_log_keys(cur, rbp, 1, rrecs); - xfs_bmbt_log_ptrs(cur, rbp, 1, rrecs); - } else { - memmove(rrp, rrp + 1, rrecs * sizeof(*rrp)); - xfs_bmbt_log_recs(cur, rbp, 1, rrecs); - key.br_startoff = cpu_to_be64(xfs_bmbt_disk_get_startoff(rrp)); - rkp = &key; - } - if ((error = xfs_bmbt_updkey(cur, rkp, level + 1))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - cur->bc_ptrs[level]--; - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 1; - return 0; -} /* - * Move 1 record right from cur/level if possible. - * Update cur to reflect the new path. + * Get the block pointer for the given level of the cursor. + * Fill in the buffer pointer, if applicable. */ -STATIC int /* error */ -xfs_bmbt_rshift( +STATIC xfs_btree_block_t * +xfs_bmbt_get_block( xfs_btree_cur_t *cur, int level, - int *stat) /* success/failure */ + xfs_buf_t **bpp) { - int error; /* error return value */ - int i; /* loop counter */ - xfs_bmbt_key_t key; /* bmap btree key */ - xfs_buf_t *lbp; /* left buffer pointer */ - xfs_bmbt_block_t *left; /* left btree block */ - xfs_bmbt_key_t *lkp; /* left btree key */ - xfs_bmbt_ptr_t *lpp; /* left address pointer */ - xfs_bmbt_rec_t *lrp; /* left record pointer */ - xfs_mount_t *mp; /* file system mount point */ - xfs_buf_t *rbp; /* right buffer pointer */ - xfs_bmbt_block_t *right; /* right btree block */ - xfs_bmbt_key_t *rkp; /* right btree key */ - xfs_bmbt_ptr_t *rpp; /* right address pointer */ - xfs_bmbt_rec_t *rrp=NULL; /* right record pointer */ - struct xfs_btree_cur *tcur; /* temporary btree cursor */ - - XFS_BMBT_TRACE_CURSOR(cur, ENTRY); - XFS_BMBT_TRACE_ARGI(cur, level); - if (level == cur->bc_nlevels - 1) { - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 0; - return 0; - } - lbp = cur->bc_bufs[level]; - left = XFS_BUF_TO_BMBT_BLOCK(lbp); -#ifdef DEBUG - if ((error = xfs_btree_check_lblock(cur, left, level, lbp))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } -#endif - if (be64_to_cpu(left->bb_rightsib) == NULLDFSBNO) { - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 0; - return 0; - } - if (cur->bc_ptrs[level] >= be16_to_cpu(left->bb_numrecs)) { - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 0; - return 0; - } - mp = cur->bc_mp; - if ((error = xfs_btree_read_bufl(mp, cur->bc_tp, be64_to_cpu(left->bb_rightsib), 0, - &rbp, XFS_BMAP_BTREE_REF))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - right = XFS_BUF_TO_BMBT_BLOCK(rbp); - if ((error = xfs_btree_check_lblock(cur, right, level, rbp))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - if (be16_to_cpu(right->bb_numrecs) == XFS_BMAP_BLOCK_IMAXRECS(level, cur)) { - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 0; - return 0; - } - if (level > 0) { - lkp = XFS_BMAP_KEY_IADDR(left, be16_to_cpu(left->bb_numrecs), cur); - lpp = XFS_BMAP_PTR_IADDR(left, be16_to_cpu(left->bb_numrecs), cur); - rkp = XFS_BMAP_KEY_IADDR(right, 1, cur); - rpp = XFS_BMAP_PTR_IADDR(right, 1, cur); -#ifdef DEBUG - for (i = be16_to_cpu(right->bb_numrecs) - 1; i >= 0; i--) { - if ((error = xfs_btree_check_lptr_disk(cur, rpp[i], level))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - } -#endif - memmove(rkp + 1, rkp, be16_to_cpu(right->bb_numrecs) * sizeof(*rkp)); - memmove(rpp + 1, rpp, be16_to_cpu(right->bb_numrecs) * sizeof(*rpp)); -#ifdef DEBUG - if ((error = xfs_btree_check_lptr_disk(cur, *lpp, level))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } -#endif - *rkp = *lkp; - *rpp = *lpp; - xfs_bmbt_log_keys(cur, rbp, 1, be16_to_cpu(right->bb_numrecs) + 1); - xfs_bmbt_log_ptrs(cur, rbp, 1, be16_to_cpu(right->bb_numrecs) + 1); + xfs_ifork_t *ifp; + xfs_bmbt_block_t *rval; + + if (level < cur->bc_nlevels - 1) { + *bpp = cur->bc_bufs[level]; + rval = XFS_BUF_TO_BMBT_BLOCK(*bpp); } else { - lrp = XFS_BMAP_REC_IADDR(left, be16_to_cpu(left->bb_numrecs), cur); - rrp = XFS_BMAP_REC_IADDR(right, 1, cur); - memmove(rrp + 1, rrp, be16_to_cpu(right->bb_numrecs) * sizeof(*rrp)); - *rrp = *lrp; - xfs_bmbt_log_recs(cur, rbp, 1, be16_to_cpu(right->bb_numrecs) + 1); - key.br_startoff = cpu_to_be64(xfs_bmbt_disk_get_startoff(rrp)); - rkp = &key; - } - be16_add(&left->bb_numrecs, -1); - xfs_bmbt_log_block(cur, lbp, XFS_BB_NUMRECS); - be16_add(&right->bb_numrecs, 1); -#ifdef DEBUG - if (level > 0) - xfs_btree_check_key(XFS_BTNUM_BMAP, rkp, rkp + 1); - else - xfs_btree_check_rec(XFS_BTNUM_BMAP, rrp, rrp + 1); -#endif - xfs_bmbt_log_block(cur, rbp, XFS_BB_NUMRECS); - if ((error = xfs_btree_dup_cursor(cur, &tcur))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - i = xfs_btree_lastrec(tcur, level); - XFS_WANT_CORRUPTED_GOTO(i == 1, error0); - if ((error = xfs_bmbt_increment(tcur, level, &i))) { - XFS_BMBT_TRACE_CURSOR(tcur, ERROR); - goto error1; - } - XFS_WANT_CORRUPTED_GOTO(i == 1, error0); - if ((error = xfs_bmbt_updkey(tcur, rkp, level + 1))) { - XFS_BMBT_TRACE_CURSOR(tcur, ERROR); - goto error1; + *bpp = NULL; + ifp = XFS_IFORK_PTR(cur->bc_private.b.ip, + cur->bc_private.b.whichfork); + rval = ifp->if_broot; } - xfs_btree_del_cursor(tcur, XFS_BTREE_NOERROR); - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 1; + return (xfs_btree_block_t *)rval; +} + + +STATIC int +xfs_bmbt_get_buf( + xfs_btree_cur_t *cur, + xfs_btree_ptr_t *ptr, + int flags, + xfs_buf_t **bpp) +{ + xfs_buf_t *bp; + + BUG_ON(be64_to_cpu(ptr->u.bmbt) == 0); + bp = xfs_btree_get_bufl(cur->bc_mp, cur->bc_tp, + be64_to_cpu(ptr->u.bmbt), flags); + *bpp = bp; return 0; -error0: - XFS_BMBT_TRACE_CURSOR(cur, ERROR); -error1: - xfs_btree_del_cursor(tcur, XFS_BTREE_ERROR); - return error; + } -/* - * Determine the extent state. - */ -/* ARGSUSED */ -STATIC xfs_exntst_t -xfs_extent_state( - xfs_filblks_t blks, - int extent_flag) +STATIC int +xfs_bmbt_read_buf( + xfs_btree_cur_t *cur, + xfs_btree_ptr_t *ptr, + int flags, + xfs_buf_t **bpp) +{ + BUG_ON(be64_to_cpu(ptr->u.bmbt) == 0); + return xfs_btree_read_bufl(cur->bc_mp, cur->bc_tp, + be64_to_cpu(ptr->u.bmbt), flags, + bpp, XFS_BMAP_BTREE_REF); +} + +STATIC xfs_btree_block_t * +xfs_bmbt_buf_to_block( + xfs_btree_cur_t *cur, + xfs_buf_t *bp) { - if (extent_flag) { - ASSERT(blks != 0); /* saved for DMIG */ - return XFS_EXT_UNWRITTEN; - } - return XFS_EXT_NORM; + /* XFS_BUF_TO_BMBT_BLOCK(rbp); */ + return XFS_BUF_TO_BLOCK(bp); } +STATIC void +xfs_bmbt_buf_to_ptr( + xfs_btree_cur_t *cur, + xfs_buf_t *bp, + xfs_btree_ptr_t *ptr) +{ + ptr->u.bmbt = cpu_to_be64(XFS_DADDR_TO_FSB(cur->bc_mp, XFS_BUF_ADDR(bp))); +} -/* - * Split cur/level block in half. - * Return new block number and its first record (to be inserted into parent). - */ -STATIC int /* error */ -xfs_bmbt_split( - xfs_btree_cur_t *cur, - int level, - xfs_fsblock_t *bnop, - __uint64_t *startoff, - xfs_btree_cur_t **curp, - int *stat) /* success/failure */ +STATIC int +xfs_bmbt_alloc_block( + xfs_btree_cur_t *cur, + xfs_btree_ptr_t *start, + xfs_btree_ptr_t *new, + int length, + int *stat) { xfs_alloc_arg_t args; /* block allocation args */ int error; /* error return value */ - int i; /* loop counter */ - xfs_fsblock_t lbno; /* left sibling block number */ - xfs_buf_t *lbp; /* left buffer pointer */ - xfs_bmbt_block_t *left; /* left btree block */ - xfs_bmbt_key_t *lkp; /* left btree key */ - xfs_bmbt_ptr_t *lpp; /* left address pointer */ - xfs_bmbt_rec_t *lrp; /* left record pointer */ - xfs_buf_t *rbp; /* right buffer pointer */ - xfs_bmbt_block_t *right; /* right btree block */ - xfs_bmbt_key_t *rkp; /* right btree key */ - xfs_bmbt_ptr_t *rpp; /* right address pointer */ - xfs_bmbt_block_t *rrblock; /* right-right btree block */ - xfs_buf_t *rrbp; /* right-right buffer pointer */ - xfs_bmbt_rec_t *rrp; /* right record pointer */ + xfs_fsblock_t sbno = be64_to_cpu(start->u.bmbt); - XFS_BMBT_TRACE_CURSOR(cur, ENTRY); - XFS_BMBT_TRACE_ARGIFK(cur, level, *bnop, *startoff); + memset(&args, 0, sizeof(args)); args.tp = cur->bc_tp; args.mp = cur->bc_mp; - lbp = cur->bc_bufs[level]; - lbno = XFS_DADDR_TO_FSB(args.mp, XFS_BUF_ADDR(lbp)); - left = XFS_BUF_TO_BMBT_BLOCK(lbp); args.fsbno = cur->bc_private.b.firstblock; args.firstblock = args.fsbno; if (args.fsbno == NULLFSBLOCK) { - args.fsbno = lbno; + args.fsbno = sbno; args.type = XFS_ALLOCTYPE_START_BNO; } else args.type = XFS_ALLOCTYPE_NEAR_BNO; @@ -1503,15 +581,16 @@ xfs_bmbt_split( args.minlen = args.maxlen = args.prod = 1; args.wasdel = cur->bc_private.b.flags & XFS_BTCUR_BPRV_WASDEL; if (!args.wasdel && xfs_trans_get_block_res(args.tp) == 0) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); + XFS_BTREE_TRACE_CURSOR(cur, XBT_ERROR); return XFS_ERROR(ENOSPC); } - if ((error = xfs_alloc_vextent(&args))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); + error = xfs_alloc_vextent(&args); + if (error) { + XFS_BTREE_TRACE_CURSOR(cur, XBT_ERROR); return error; } if (args.fsbno == NULLFSBLOCK) { - XFS_BMBT_TRACE_CURSOR(cur, EXIT); + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); *stat = 0; return 0; } @@ -1522,602 +601,383 @@ xfs_bmbt_split( xfs_trans_log_inode(args.tp, cur->bc_private.b.ip, XFS_ILOG_CORE); XFS_TRANS_MOD_DQUOT_BYINO(args.mp, args.tp, cur->bc_private.b.ip, XFS_TRANS_DQ_BCOUNT, 1L); - rbp = xfs_btree_get_bufl(args.mp, args.tp, args.fsbno, 0); - right = XFS_BUF_TO_BMBT_BLOCK(rbp); -#ifdef DEBUG - if ((error = xfs_btree_check_lblock(cur, left, level, rbp))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } -#endif - right->bb_magic = cpu_to_be32(XFS_BMAP_MAGIC); - right->bb_level = left->bb_level; - right->bb_numrecs = cpu_to_be16(be16_to_cpu(left->bb_numrecs) / 2); - if ((be16_to_cpu(left->bb_numrecs) & 1) && - cur->bc_ptrs[level] <= be16_to_cpu(right->bb_numrecs) + 1) - be16_add(&right->bb_numrecs, 1); - i = be16_to_cpu(left->bb_numrecs) - be16_to_cpu(right->bb_numrecs) + 1; - if (level > 0) { - lkp = XFS_BMAP_KEY_IADDR(left, i, cur); - lpp = XFS_BMAP_PTR_IADDR(left, i, cur); - rkp = XFS_BMAP_KEY_IADDR(right, 1, cur); - rpp = XFS_BMAP_PTR_IADDR(right, 1, cur); -#ifdef DEBUG - for (i = 0; i < be16_to_cpu(right->bb_numrecs); i++) { - if ((error = xfs_btree_check_lptr_disk(cur, lpp[i], level))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - } -#endif - memcpy(rkp, lkp, be16_to_cpu(right->bb_numrecs) * sizeof(*rkp)); - memcpy(rpp, lpp, be16_to_cpu(right->bb_numrecs) * sizeof(*rpp)); - xfs_bmbt_log_keys(cur, rbp, 1, be16_to_cpu(right->bb_numrecs)); - xfs_bmbt_log_ptrs(cur, rbp, 1, be16_to_cpu(right->bb_numrecs)); - *startoff = be64_to_cpu(rkp->br_startoff); - } else { - lrp = XFS_BMAP_REC_IADDR(left, i, cur); - rrp = XFS_BMAP_REC_IADDR(right, 1, cur); - memcpy(rrp, lrp, be16_to_cpu(right->bb_numrecs) * sizeof(*rrp)); - xfs_bmbt_log_recs(cur, rbp, 1, be16_to_cpu(right->bb_numrecs)); - *startoff = xfs_bmbt_disk_get_startoff(rrp); - } - be16_add(&left->bb_numrecs, -(be16_to_cpu(right->bb_numrecs))); - right->bb_rightsib = left->bb_rightsib; - left->bb_rightsib = cpu_to_be64(args.fsbno); - right->bb_leftsib = cpu_to_be64(lbno); - xfs_bmbt_log_block(cur, rbp, XFS_BB_ALL_BITS); - xfs_bmbt_log_block(cur, lbp, XFS_BB_NUMRECS | XFS_BB_RIGHTSIB); - if (be64_to_cpu(right->bb_rightsib) != NULLDFSBNO) { - if ((error = xfs_btree_read_bufl(args.mp, args.tp, - be64_to_cpu(right->bb_rightsib), 0, &rrbp, - XFS_BMAP_BTREE_REF))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - rrblock = XFS_BUF_TO_BMBT_BLOCK(rrbp); - if ((error = xfs_btree_check_lblock(cur, rrblock, level, rrbp))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - rrblock->bb_leftsib = cpu_to_be64(args.fsbno); - xfs_bmbt_log_block(cur, rrbp, XFS_BB_LEFTSIB); - } - if (cur->bc_ptrs[level] > be16_to_cpu(left->bb_numrecs) + 1) { - xfs_btree_setbuf(cur, level, rbp); - cur->bc_ptrs[level] -= be16_to_cpu(left->bb_numrecs); - } - if (level + 1 < cur->bc_nlevels) { - if ((error = xfs_btree_dup_cursor(cur, curp))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - (*curp)->bc_ptrs[level + 1]++; - } - *bnop = args.fsbno; - XFS_BMBT_TRACE_CURSOR(cur, EXIT); + + new->u.bmbt = cpu_to_be64(args.fsbno); *stat = 1; return 0; } - -/* - * Update keys for the record. - */ STATIC int -xfs_bmbt_updkey( - xfs_btree_cur_t *cur, - xfs_bmbt_key_t *keyp, /* on-disk format */ - int level) +xfs_bmbt_free_block( + xfs_btree_cur_t *cur, + xfs_buf_t *bp, + int size) { - xfs_bmbt_block_t *block; - xfs_buf_t *bp; -#ifdef DEBUG - int error; -#endif - xfs_bmbt_key_t *kp; - int ptr; + xfs_mount_t *mp = cur->bc_mp; + xfs_inode_t *ip = cur->bc_private.b.ip; - ASSERT(level >= 1); - XFS_BMBT_TRACE_CURSOR(cur, ENTRY); - XFS_BMBT_TRACE_ARGIK(cur, level, keyp); - for (ptr = 1; ptr == 1 && level < cur->bc_nlevels; level++) { - block = xfs_bmbt_get_block(cur, level, &bp); -#ifdef DEBUG - if ((error = xfs_btree_check_lblock(cur, block, level, bp))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } -#endif - ptr = cur->bc_ptrs[level]; - kp = XFS_BMAP_KEY_IADDR(block, ptr, cur); - *kp = *keyp; - xfs_bmbt_log_keys(cur, bp, ptr, ptr); - } - XFS_BMBT_TRACE_CURSOR(cur, EXIT); + xfs_bmap_add_free(XFS_DADDR_TO_FSB(mp, XFS_BUF_ADDR(bp)), 1, + cur->bc_private.b.flist, mp); + ip->i_d.di_nblocks--; + xfs_trans_log_inode(cur->bc_tp, ip, XFS_ILOG_CORE); + XFS_TRANS_MOD_DQUOT_BYINO(mp, cur->bc_tp, ip, XFS_TRANS_DQ_BCOUNT, -1L); + xfs_trans_binval(cur->bc_tp, bp); return 0; } + /* - * Convert on-disk form of btree root to in-memory form. + * Log fields from the btree block header. */ void -xfs_bmdr_to_bmbt( - xfs_bmdr_block_t *dblock, - int dblocklen, - xfs_bmbt_block_t *rblock, - int rblocklen) -{ - int dmxr; - xfs_bmbt_key_t *fkp; - __be64 *fpp; - xfs_bmbt_key_t *tkp; - __be64 *tpp; - - rblock->bb_magic = cpu_to_be32(XFS_BMAP_MAGIC); - rblock->bb_level = dblock->bb_level; - ASSERT(be16_to_cpu(rblock->bb_level) > 0); - rblock->bb_numrecs = dblock->bb_numrecs; - rblock->bb_leftsib = cpu_to_be64(NULLDFSBNO); - rblock->bb_rightsib = cpu_to_be64(NULLDFSBNO); - dmxr = (int)XFS_BTREE_BLOCK_MAXRECS(dblocklen, xfs_bmdr, 0); - fkp = XFS_BTREE_KEY_ADDR(xfs_bmdr, dblock, 1); - tkp = XFS_BMAP_BROOT_KEY_ADDR(rblock, 1, rblocklen); - fpp = XFS_BTREE_PTR_ADDR(xfs_bmdr, dblock, 1, dmxr); - tpp = XFS_BMAP_BROOT_PTR_ADDR(rblock, 1, rblocklen); - dmxr = be16_to_cpu(dblock->bb_numrecs); - memcpy(tkp, fkp, sizeof(*fkp) * dmxr); - memcpy(tpp, fpp, sizeof(*fpp) * dmxr); -} +xfs_bmbt_log_block( + xfs_btree_cur_t *cur, /* btree cursor */ + xfs_buf_t *bp, /* buffer containing btree block */ + int fields) /* mask of fields: XFS_BB_... */ +{ + int first; /* first byte offset logged */ + int last; /* last byte offset logged */ + static const short offsets[] = { /* table of offsets */ + offsetof(xfs_bmbt_block_t, bb_magic), + offsetof(xfs_bmbt_block_t, bb_level), + offsetof(xfs_bmbt_block_t, bb_numrecs), + offsetof(xfs_bmbt_block_t, bb_leftsib), + offsetof(xfs_bmbt_block_t, bb_rightsib), + sizeof(xfs_bmbt_block_t) + }; -/* - * Decrement cursor by one record at the level. - * For nonzero levels the leaf-ward information is untouched. - */ -int /* error */ -xfs_bmbt_decrement( - xfs_btree_cur_t *cur, - int level, - int *stat) /* success/failure */ -{ - xfs_bmbt_block_t *block; - xfs_buf_t *bp; - int error; /* error return value */ - xfs_fsblock_t fsbno; - int lev; - xfs_mount_t *mp; - xfs_trans_t *tp; - - XFS_BMBT_TRACE_CURSOR(cur, ENTRY); - XFS_BMBT_TRACE_ARGI(cur, level); - ASSERT(level < cur->bc_nlevels); - if (level < cur->bc_nlevels - 1) - xfs_btree_readahead(cur, level, XFS_BTCUR_LEFTRA); - if (--cur->bc_ptrs[level] > 0) { - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 1; - return 0; - } - block = xfs_bmbt_get_block(cur, level, &bp); -#ifdef DEBUG - if ((error = xfs_btree_check_lblock(cur, block, level, bp))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } -#endif - if (be64_to_cpu(block->bb_leftsib) == NULLDFSBNO) { - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 0; - return 0; - } - for (lev = level + 1; lev < cur->bc_nlevels; lev++) { - if (--cur->bc_ptrs[lev] > 0) - break; - if (lev < cur->bc_nlevels - 1) - xfs_btree_readahead(cur, lev, XFS_BTCUR_LEFTRA); - } - if (lev == cur->bc_nlevels) { - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 0; - return 0; - } - tp = cur->bc_tp; - mp = cur->bc_mp; - for (block = xfs_bmbt_get_block(cur, lev, &bp); lev > level; ) { - fsbno = be64_to_cpu(*XFS_BMAP_PTR_IADDR(block, cur->bc_ptrs[lev], cur)); - if ((error = xfs_btree_read_bufl(mp, tp, fsbno, 0, &bp, - XFS_BMAP_BTREE_REF))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - lev--; - xfs_btree_setbuf(cur, lev, bp); - block = XFS_BUF_TO_BMBT_BLOCK(bp); - if ((error = xfs_btree_check_lblock(cur, block, lev, bp))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - cur->bc_ptrs[lev] = be16_to_cpu(block->bb_numrecs); - } - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 1; - return 0; + XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY); + XFS_BTREE_TRACE_ARGBI(cur, bp, fields); + if (bp) { + xfs_btree_offsets(fields, offsets, XFS_BB_NUM_BITS, &first, + &last); + xfs_trans_log_buf(cur->bc_tp, bp, first, last); + } else + xfs_trans_log_inode(cur->bc_tp, cur->bc_private.b.ip, + XFS_ILOG_FBROOT(cur->bc_private.b.whichfork)); + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); } -/* - * Delete the record pointed to by cur. - */ -int /* error */ -xfs_bmbt_delete( +static const struct xfs_btree_block_ops xfs_bmbt_blkops = { + .get_buf = xfs_bmbt_get_buf, + .read_buf = xfs_bmbt_read_buf, + .get_block = xfs_bmbt_get_block, + .buf_to_block = xfs_bmbt_buf_to_block, + .buf_to_ptr = xfs_bmbt_buf_to_ptr, + .log_block = xfs_bmbt_log_block, + .check_block = xfs_btree_check_lblock, + + .alloc_block = xfs_bmbt_alloc_block, + .free_block = xfs_bmbt_free_block, + + .get_sibling = xfs_btree_get_lsibling, + .set_sibling = xfs_btree_set_lsibling, + .init_sibling = xfs_btree_init_sibling, +}; + +STATIC int +xfs_bmbt_get_iminrecs( xfs_btree_cur_t *cur, - int *stat) /* success/failure */ + int lev) { - int error; /* error return value */ - int i; - int level; - - XFS_BMBT_TRACE_CURSOR(cur, ENTRY); - for (level = 0, i = 2; i == 2; level++) { - if ((error = xfs_bmbt_delrec(cur, level, &i))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - } - if (i == 0) { - for (level = 1; level < cur->bc_nlevels; level++) { - if (cur->bc_ptrs[level] == 0) { - if ((error = xfs_bmbt_decrement(cur, level, - &i))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - break; - } - } - } - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = i; - return 0; + return XFS_BMAP_BLOCK_IMINRECS(lev, cur); } -/* - * Convert a compressed bmap extent record to an uncompressed form. - * This code must be in sync with the routines xfs_bmbt_get_startoff, - * xfs_bmbt_get_startblock, xfs_bmbt_get_blockcount and xfs_bmbt_get_state. - */ - -STATIC_INLINE void -__xfs_bmbt_get_all( - __uint64_t l0, - __uint64_t l1, - xfs_bmbt_irec_t *s) +STATIC int +xfs_bmbt_get_imaxrecs( + xfs_btree_cur_t *cur, + int lev) { - int ext_flag; - xfs_exntst_t st; - - ext_flag = (int)(l0 >> (64 - BMBT_EXNTFLAG_BITLEN)); - s->br_startoff = ((xfs_fileoff_t)l0 & - XFS_MASK64LO(64 - BMBT_EXNTFLAG_BITLEN)) >> 9; -#if XFS_BIG_BLKNOS - s->br_startblock = (((xfs_fsblock_t)l0 & XFS_MASK64LO(9)) << 43) | - (((xfs_fsblock_t)l1) >> 21); -#else -#ifdef DEBUG - { - xfs_dfsbno_t b; + return XFS_BMAP_BLOCK_IMAXRECS(lev, cur); +} - b = (((xfs_dfsbno_t)l0 & XFS_MASK64LO(9)) << 43) | - (((xfs_dfsbno_t)l1) >> 21); - ASSERT((b >> 32) == 0 || ISNULLDSTARTBLOCK(b)); - s->br_startblock = (xfs_fsblock_t)b; - } -#else /* !DEBUG */ - s->br_startblock = (xfs_fsblock_t)(((xfs_dfsbno_t)l1) >> 21); -#endif /* DEBUG */ -#endif /* XFS_BIG_BLKNOS */ - s->br_blockcount = (xfs_filblks_t)(l1 & XFS_MASK64LO(21)); - /* This is xfs_extent_state() in-line */ - if (ext_flag) { - ASSERT(s->br_blockcount != 0); /* saved for DMIG */ - st = XFS_EXT_UNWRITTEN; - } else - st = XFS_EXT_NORM; - s->br_state = st; +STATIC int +xfs_bmbt_get_dminrecs( + xfs_btree_cur_t *cur, + int lev) +{ + return XFS_BMAP_BLOCK_DMINRECS(lev, cur); } -void -xfs_bmbt_get_all( - xfs_bmbt_rec_host_t *r, - xfs_bmbt_irec_t *s) +STATIC int +xfs_bmbt_get_dmaxrecs( + xfs_btree_cur_t *cur, + int lev) { - __xfs_bmbt_get_all(r->l0, r->l1, s); + return XFS_BMAP_BLOCK_DMAXRECS(lev, cur); } -/* - * Get the block pointer for the given level of the cursor. - * Fill in the buffer pointer, if applicable. - */ -xfs_bmbt_block_t * -xfs_bmbt_get_block( +STATIC int +xfs_btree_get_numrecs( xfs_btree_cur_t *cur, - int level, - xfs_buf_t **bpp) + xfs_btree_block_t *block) { - xfs_ifork_t *ifp; - xfs_bmbt_block_t *rval; + return be16_to_cpu(block->bb_h.bb_numrecs); +} - if (level < cur->bc_nlevels - 1) { - *bpp = cur->bc_bufs[level]; - rval = XFS_BUF_TO_BMBT_BLOCK(*bpp); - } else { - *bpp = NULL; - ifp = XFS_IFORK_PTR(cur->bc_private.b.ip, - cur->bc_private.b.whichfork); - rval = ifp->if_broot; - } - return rval; +STATIC void +xfs_btree_set_numrecs( + xfs_btree_cur_t *cur, + xfs_btree_block_t *block, + int numrecs) +{ + block->bb_h.bb_numrecs = cpu_to_be16(numrecs); } -/* - * Extract the blockcount field from an in memory bmap extent record. - */ -xfs_filblks_t -xfs_bmbt_get_blockcount( - xfs_bmbt_rec_host_t *r) +STATIC void +xfs_bmbt_init_key_from_rec( + xfs_btree_cur_t *cur, + xfs_btree_key_t *key, + xfs_btree_rec_t *rec) { - return (xfs_filblks_t)(r->l1 & XFS_MASK64LO(21)); + key->u.bmbt.br_startoff = cpu_to_be64( + xfs_bmbt_disk_get_startoff(&rec->u.bmbt)); } /* - * Extract the startblock field from an in memory bmap extent record. + * intial value of ptr for lookup */ -xfs_fsblock_t -xfs_bmbt_get_startblock( - xfs_bmbt_rec_host_t *r) +STATIC void +xfs_bmbt_init_ptr_from_cur( + xfs_btree_cur_t *cur, + xfs_btree_ptr_t *ptr) { -#if XFS_BIG_BLKNOS - return (((xfs_fsblock_t)r->l0 & XFS_MASK64LO(9)) << 43) | - (((xfs_fsblock_t)r->l1) >> 21); -#else -#ifdef DEBUG - xfs_dfsbno_t b; - - b = (((xfs_dfsbno_t)r->l0 & XFS_MASK64LO(9)) << 43) | - (((xfs_dfsbno_t)r->l1) >> 21); - ASSERT((b >> 32) == 0 || ISNULLDSTARTBLOCK(b)); - return (xfs_fsblock_t)b; -#else /* !DEBUG */ - return (xfs_fsblock_t)(((xfs_dfsbno_t)r->l1) >> 21); -#endif /* DEBUG */ -#endif /* XFS_BIG_BLKNOS */ + ptr->u.bmbt = 0; } -/* - * Extract the startoff field from an in memory bmap extent record. - */ -xfs_fileoff_t -xfs_bmbt_get_startoff( - xfs_bmbt_rec_host_t *r) +STATIC void +xfs_bmbt_init_rec_from_key( + xfs_btree_cur_t *cur, + xfs_btree_key_t *key, + xfs_btree_rec_t *rec) { - return ((xfs_fileoff_t)r->l0 & - XFS_MASK64LO(64 - BMBT_EXNTFLAG_BITLEN)) >> 9; + BUG_ON(be64_to_cpu(key->u.bmbt.br_startoff) == 0); + xfs_bmbt_disk_set_allf(&rec->u.bmbt, + be64_to_cpu(key->u.bmbt.br_startoff), + 0, 0, XFS_EXT_NORM); } -xfs_exntst_t -xfs_bmbt_get_state( - xfs_bmbt_rec_host_t *r) +STATIC void +xfs_bmbt_init_rec_from_cur( + xfs_btree_cur_t *cur, + xfs_btree_rec_t *rec) { - int ext_flag; + BUG_ON(cur->bc_rec.b.br_startoff == 0); + xfs_bmbt_disk_set_all(&rec->u.bmbt, &cur->bc_rec.b); +} - ext_flag = (int)((r->l0) >> (64 - BMBT_EXNTFLAG_BITLEN)); - return xfs_extent_state(xfs_bmbt_get_blockcount(r), - ext_flag); +STATIC xfs_btree_key_t * +xfs_bmbt_key_addr( + xfs_btree_cur_t *cur, + int index, + xfs_btree_block_t *block) +{ + return (xfs_btree_key_t *)XFS_BMAP_KEY_IADDR(&block->bb_h, index, cur); } -/* Endian flipping versions of the bmbt extraction functions */ -void -xfs_bmbt_disk_get_all( - xfs_bmbt_rec_t *r, - xfs_bmbt_irec_t *s) +STATIC xfs_btree_ptr_t * +xfs_bmbt_ptr_addr( + xfs_btree_cur_t *cur, + int index, + xfs_btree_block_t *block) { - __xfs_bmbt_get_all(be64_to_cpu(r->l0), be64_to_cpu(r->l1), s); + return (xfs_btree_ptr_t *)XFS_BMAP_PTR_IADDR(&block->bb_h, index, cur); } -/* - * Extract the blockcount field from an on disk bmap extent record. - */ -xfs_filblks_t -xfs_bmbt_disk_get_blockcount( - xfs_bmbt_rec_t *r) +STATIC xfs_btree_rec_t * +xfs_bmbt_rec_addr( + xfs_btree_cur_t *cur, + int index, + xfs_btree_block_t *block) { - return (xfs_filblks_t)(be64_to_cpu(r->l1) & XFS_MASK64LO(21)); + return (xfs_btree_rec_t *)XFS_BMAP_REC_IADDR(&block->bb_h, index, cur); } -/* - * Extract the startoff field from a disk format bmap extent record. - */ -xfs_fileoff_t -xfs_bmbt_disk_get_startoff( - xfs_bmbt_rec_t *r) +STATIC int64_t +xfs_bmbt_key_diff( + xfs_btree_cur_t *cur, + xfs_btree_key_t *key) { - return ((xfs_fileoff_t)be64_to_cpu(r->l0) & - XFS_MASK64LO(64 - BMBT_EXNTFLAG_BITLEN)) >> 9; + return (int64_t)(be64_to_cpu(key->u.bmbt.br_startoff) - + cur->bc_rec.b.br_startoff); } -/* - * Increment cursor by one record at the level. - * For nonzero levels the leaf-ward information is untouched. - */ -int /* error */ -xfs_bmbt_increment( +STATIC xfs_daddr_t +xfs_bmbt_ptr_to_daddr( xfs_btree_cur_t *cur, - int level, - int *stat) /* success/failure */ + xfs_btree_ptr_t *ptr) { - xfs_bmbt_block_t *block; - xfs_buf_t *bp; - int error; /* error return value */ - xfs_fsblock_t fsbno; - int lev; - xfs_mount_t *mp; - xfs_trans_t *tp; - - XFS_BMBT_TRACE_CURSOR(cur, ENTRY); - XFS_BMBT_TRACE_ARGI(cur, level); - ASSERT(level < cur->bc_nlevels); - if (level < cur->bc_nlevels - 1) - xfs_btree_readahead(cur, level, XFS_BTCUR_RIGHTRA); - block = xfs_bmbt_get_block(cur, level, &bp); -#ifdef DEBUG - if ((error = xfs_btree_check_lblock(cur, block, level, bp))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } -#endif - if (++cur->bc_ptrs[level] <= be16_to_cpu(block->bb_numrecs)) { - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 1; - return 0; - } - if (be64_to_cpu(block->bb_rightsib) == NULLDFSBNO) { - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 0; - return 0; - } - for (lev = level + 1; lev < cur->bc_nlevels; lev++) { - block = xfs_bmbt_get_block(cur, lev, &bp); -#ifdef DEBUG - if ((error = xfs_btree_check_lblock(cur, block, lev, bp))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } -#endif - if (++cur->bc_ptrs[lev] <= be16_to_cpu(block->bb_numrecs)) - break; - if (lev < cur->bc_nlevels - 1) - xfs_btree_readahead(cur, lev, XFS_BTCUR_RIGHTRA); + return XFS_FSB_TO_DADDR(cur->bc_mp, be64_to_cpu(ptr->u.bmbt)); +} + +STATIC void +xfs_bmbt_move_keys( + xfs_btree_cur_t *cur, + xfs_btree_key_t *src_key, + xfs_btree_key_t *dst_key, + int from, + int to, + int numkeys) +{ + if (dst_key == NULL) { + /* moving within a block */ + xfs_bmbt_key_t *kp = &src_key->u.bmbt; + memmove(&kp[to], &kp[from], numkeys * sizeof(*kp)); + } else { + /* moving between blocks */ + memcpy(dst_key, src_key, numkeys * sizeof(xfs_bmbt_key_t)); } - if (lev == cur->bc_nlevels) { - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 0; - return 0; +} + +STATIC void +xfs_bmbt_move_ptrs( + xfs_btree_cur_t *cur, + xfs_btree_ptr_t *src_ptr, + xfs_btree_ptr_t *dst_ptr, + int from, + int to, + int numptrs) +{ + if (dst_ptr == NULL) { + /* moving within a block */ + xfs_bmbt_ptr_t *pp = &src_ptr->u.bmbt; + memmove(&pp[to], &pp[from], numptrs * sizeof(*pp)); + } else { + /* moving between blocks */ + memcpy(dst_ptr, src_ptr, numptrs * sizeof(xfs_bmbt_ptr_t)); } - tp = cur->bc_tp; - mp = cur->bc_mp; - for (block = xfs_bmbt_get_block(cur, lev, &bp); lev > level; ) { - fsbno = be64_to_cpu(*XFS_BMAP_PTR_IADDR(block, cur->bc_ptrs[lev], cur)); - if ((error = xfs_btree_read_bufl(mp, tp, fsbno, 0, &bp, - XFS_BMAP_BTREE_REF))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - lev--; - xfs_btree_setbuf(cur, lev, bp); - block = XFS_BUF_TO_BMBT_BLOCK(bp); - if ((error = xfs_btree_check_lblock(cur, block, lev, bp))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - cur->bc_ptrs[lev] = 1; +} + +STATIC void +xfs_bmbt_move_recs( + xfs_btree_cur_t *cur, + xfs_btree_rec_t *src_rec, + xfs_btree_rec_t *dst_rec, + int from, + int to, + int numrecs) +{ + if (dst_rec == NULL) { + /* moving within a block */ + xfs_bmbt_rec_t *rp = &src_rec->u.bmbt; + memmove(&rp[to], &rp[from], numrecs * sizeof(*rp)); + } else { + /* moving between blocks */ + memcpy(dst_rec, src_rec, numrecs * sizeof(xfs_bmbt_rec_t)); } - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 1; - return 0; +} + + +STATIC void +xfs_bmbt_set_key( + xfs_btree_cur_t *cur, + xfs_btree_key_t *key_addr, + int index, + xfs_btree_key_t *newkey) +{ + xfs_bmbt_key_t *kp = &key_addr->u.bmbt; + + kp[index] = newkey->u.bmbt; +} + +STATIC void +xfs_bmbt_set_ptr( + xfs_btree_cur_t *cur, + xfs_btree_ptr_t *ptr_addr, + int index, + xfs_btree_ptr_t *newptr) +{ + xfs_bmbt_ptr_t *pp = &ptr_addr->u.bmbt; + + pp[index] = newptr->u.bmbt; +} + +STATIC void +xfs_bmbt_set_rec( + xfs_btree_cur_t *cur, + xfs_btree_rec_t *rec_addr, + int index, + xfs_btree_rec_t *newrec) +{ + xfs_bmbt_rec_t *rp = &rec_addr->u.bmbt; + + rp[index] = newrec->u.bmbt; } /* - * Insert the current record at the point referenced by cur. + * Log keys from a btree block (nonleaf). */ -int /* error */ -xfs_bmbt_insert( +STATIC void +xfs_bmbt_log_keys( xfs_btree_cur_t *cur, - int *stat) /* success/failure */ + xfs_buf_t *bp, + int kfirst, + int klast) { - int error; /* error return value */ - int i; - int level; - xfs_fsblock_t nbno; - xfs_btree_cur_t *ncur; - xfs_bmbt_rec_t nrec; - xfs_btree_cur_t *pcur; - - XFS_BMBT_TRACE_CURSOR(cur, ENTRY); - level = 0; - nbno = NULLFSBLOCK; - xfs_bmbt_disk_set_all(&nrec, &cur->bc_rec.b); - ncur = NULL; - pcur = cur; - do { - if ((error = xfs_bmbt_insrec(pcur, level++, &nbno, &nrec, &ncur, - &i))) { - if (pcur != cur) - xfs_btree_del_cursor(pcur, XFS_BTREE_ERROR); - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - XFS_WANT_CORRUPTED_GOTO(i == 1, error0); - if (pcur != cur && (ncur || nbno == NULLFSBLOCK)) { - cur->bc_nlevels = pcur->bc_nlevels; - cur->bc_private.b.allocated += - pcur->bc_private.b.allocated; - pcur->bc_private.b.allocated = 0; - ASSERT((cur->bc_private.b.firstblock != NULLFSBLOCK) || - XFS_IS_REALTIME_INODE(cur->bc_private.b.ip)); - cur->bc_private.b.firstblock = - pcur->bc_private.b.firstblock; - ASSERT(cur->bc_private.b.flist == - pcur->bc_private.b.flist); - xfs_btree_del_cursor(pcur, XFS_BTREE_NOERROR); - } - if (ncur) { - pcur = ncur; - ncur = NULL; - } - } while (nbno != NULLFSBLOCK); - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = i; - return 0; -error0: - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; + xfs_trans_t *tp; + + XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY); + XFS_BTREE_TRACE_ARGBII(cur, bp, kfirst, klast); + tp = cur->bc_tp; + if (bp) { + xfs_bmbt_block_t *block; + int first; + xfs_bmbt_key_t *kp; + int last; + + block = XFS_BUF_TO_BMBT_BLOCK(bp); + kp = XFS_BMAP_KEY_DADDR(block, 1, cur); + first = (int)((xfs_caddr_t)&kp[kfirst - 1] - (xfs_caddr_t)block); + last = (int)(((xfs_caddr_t)&kp[klast] - 1) - (xfs_caddr_t)block); + xfs_trans_log_buf(tp, bp, first, last); + } else { + xfs_inode_t *ip; + + ip = cur->bc_private.b.ip; + xfs_trans_log_inode(tp, ip, + XFS_ILOG_FBROOT(cur->bc_private.b.whichfork)); + } + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); } /* - * Log fields from the btree block header. + * Log block pointer fields from a btree block (nonleaf). */ -void -xfs_bmbt_log_block( - xfs_btree_cur_t *cur, - xfs_buf_t *bp, - int fields) +STATIC void +xfs_bmbt_log_ptrs( + xfs_btree_cur_t *cur, + xfs_buf_t *bp, + int pfirst, + int plast) { - int first; - int last; - xfs_trans_t *tp; - static const short offsets[] = { - offsetof(xfs_bmbt_block_t, bb_magic), - offsetof(xfs_bmbt_block_t, bb_level), - offsetof(xfs_bmbt_block_t, bb_numrecs), - offsetof(xfs_bmbt_block_t, bb_leftsib), - offsetof(xfs_bmbt_block_t, bb_rightsib), - sizeof(xfs_bmbt_block_t) - }; + xfs_trans_t *tp; - XFS_BMBT_TRACE_CURSOR(cur, ENTRY); - XFS_BMBT_TRACE_ARGBI(cur, bp, fields); + XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY); + XFS_BTREE_TRACE_ARGBII(cur, bp, pfirst, plast); tp = cur->bc_tp; if (bp) { - xfs_btree_offsets(fields, offsets, XFS_BB_NUM_BITS, &first, - &last); + xfs_bmbt_block_t *block; + int first; + int last; + xfs_bmbt_ptr_t *pp; + + block = XFS_BUF_TO_BMBT_BLOCK(bp); + pp = XFS_BMAP_PTR_DADDR(block, 1, cur); + first = (int)((xfs_caddr_t)&pp[pfirst - 1] - (xfs_caddr_t)block); + last = (int)(((xfs_caddr_t)&pp[plast] - 1) - (xfs_caddr_t)block); xfs_trans_log_buf(tp, bp, first, last); - } else - xfs_trans_log_inode(tp, cur->bc_private.b.ip, + } else { + xfs_inode_t *ip; + + ip = cur->bc_private.b.ip; + xfs_trans_log_inode(tp, ip, XFS_ILOG_FBROOT(cur->bc_private.b.whichfork)); - XFS_BMBT_TRACE_CURSOR(cur, EXIT); + } + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); } /* - * Log record values from the btree block. + * Log records from a btree block (leaf). */ void xfs_bmbt_log_recs( @@ -2130,445 +990,432 @@ xfs_bmbt_log_recs( int first; int last; xfs_bmbt_rec_t *rp; - xfs_trans_t *tp; - XFS_BMBT_TRACE_CURSOR(cur, ENTRY); - XFS_BMBT_TRACE_ARGBII(cur, bp, rfirst, rlast); + XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY); + XFS_BTREE_TRACE_ARGBII(cur, bp, rfirst, rlast); ASSERT(bp); - tp = cur->bc_tp; block = XFS_BUF_TO_BMBT_BLOCK(bp); rp = XFS_BMAP_REC_DADDR(block, 1, cur); first = (int)((xfs_caddr_t)&rp[rfirst - 1] - (xfs_caddr_t)block); last = (int)(((xfs_caddr_t)&rp[rlast] - 1) - (xfs_caddr_t)block); - xfs_trans_log_buf(tp, bp, first, last); - XFS_BMBT_TRACE_CURSOR(cur, EXIT); + xfs_trans_log_buf(cur->bc_tp, bp, first, last); + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); } -int /* error */ -xfs_bmbt_lookup_eq( - xfs_btree_cur_t *cur, - xfs_fileoff_t off, - xfs_fsblock_t bno, - xfs_filblks_t len, - int *stat) /* success/failure */ -{ - cur->bc_rec.b.br_startoff = off; - cur->bc_rec.b.br_startblock = bno; - cur->bc_rec.b.br_blockcount = len; - return xfs_bmbt_lookup(cur, XFS_LOOKUP_EQ, stat); -} +static const struct xfs_btree_record_ops xfs_bmbt_recops = { + .get_minrecs = xfs_bmbt_get_iminrecs, + .get_maxrecs = xfs_bmbt_get_imaxrecs, + .get_dminrecs = xfs_bmbt_get_dminrecs, + .get_dmaxrecs = xfs_bmbt_get_dmaxrecs, + .get_numrecs = xfs_btree_get_numrecs, + .set_numrecs = xfs_btree_set_numrecs, + + .init_key_from_rec = xfs_bmbt_init_key_from_rec, + .init_ptr_from_cur = xfs_bmbt_init_ptr_from_cur, + .init_rec_from_key = xfs_bmbt_init_rec_from_key, + .init_rec_from_cur = xfs_bmbt_init_rec_from_cur, + + .key_addr = xfs_bmbt_key_addr, + .ptr_addr = xfs_bmbt_ptr_addr, + .rec_addr = xfs_bmbt_rec_addr, + + .key_diff = xfs_bmbt_key_diff, + .ptr_to_daddr = xfs_bmbt_ptr_to_daddr, + + .move_keys = xfs_bmbt_move_keys, + .move_ptrs = xfs_bmbt_move_ptrs, + .move_recs = xfs_bmbt_move_recs, + + .set_key = xfs_bmbt_set_key, + .set_ptr = xfs_bmbt_set_ptr, + .set_rec = xfs_bmbt_set_rec, + + .log_keys = xfs_bmbt_log_keys, + .log_ptrs = xfs_bmbt_log_ptrs, + .log_recs = xfs_bmbt_log_recs, -int /* error */ -xfs_bmbt_lookup_ge( - xfs_btree_cur_t *cur, - xfs_fileoff_t off, - xfs_fsblock_t bno, - xfs_filblks_t len, - int *stat) /* success/failure */ -{ - cur->bc_rec.b.br_startoff = off; - cur->bc_rec.b.br_startblock = bno; - cur->bc_rec.b.br_blockcount = len; - return xfs_bmbt_lookup(cur, XFS_LOOKUP_GE, stat); -} + .check_ptrs = xfs_btree_check_lptr, +}; -/* - * Give the bmap btree a new root block. Copy the old broot contents - * down into a real block and make the broot point to it. - */ -int /* error */ -xfs_bmbt_newroot( +STATIC int /* error */ +xfs_bmbt_new_root( xfs_btree_cur_t *cur, /* btree cursor */ - int *logflags, /* logging flags for inode */ int *stat) /* return status - 0 fail */ { - xfs_alloc_arg_t args; /* allocation arguments */ - xfs_bmbt_block_t *block; /* bmap btree block */ - xfs_buf_t *bp; /* buffer for block */ - xfs_bmbt_block_t *cblock; /* child btree block */ - xfs_bmbt_key_t *ckp; /* child key pointer */ - xfs_bmbt_ptr_t *cpp; /* child ptr pointer */ - int error; /* error return code */ -#ifdef DEBUG - int i; /* loop counter */ -#endif - xfs_bmbt_key_t *kp; /* pointer to bmap btree key */ - int level; /* btree level */ - xfs_bmbt_ptr_t *pp; /* pointer to bmap block addr */ + int logflags = 0; + int error; + + error = xfs_bmbt_newroot(cur, &logflags, stat); + if (!(error || *stat == 0)) + xfs_trans_log_inode(cur->bc_tp, cur->bc_private.b.ip, logflags); + return error; +} + +STATIC int +xfs_bmbt_killroot( + xfs_btree_cur_t *cur, + int lev, /* unused */ + xfs_btree_ptr_t *newroot) /* unused */ +{ + xfs_btree_block_t *block; + xfs_btree_block_t *cblock; + xfs_buf_t *cbp; + xfs_btree_key_t *ckp; + xfs_btree_ptr_t *cpp; + int i; + xfs_btree_key_t *kp; + xfs_inode_t *ip; + xfs_ifork_t *ifp; + int level; + xfs_btree_ptr_t *pp; + + ASSERT(newroot == NULL); + ASSERT(lev == -1); - XFS_BMBT_TRACE_CURSOR(cur, ENTRY); + XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY); level = cur->bc_nlevels - 1; - block = xfs_bmbt_get_block(cur, level, &bp); + ASSERT(level >= 1); /* - * Copy the root into a real block. + * Don't deal with the root block needs to be a leaf case. + * We're just going to turn the thing back into extents anyway. */ - args.mp = cur->bc_mp; - pp = XFS_BMAP_PTR_IADDR(block, 1, cur); - args.tp = cur->bc_tp; - args.fsbno = cur->bc_private.b.firstblock; - args.mod = args.minleft = args.alignment = args.total = args.isfl = - args.userdata = args.minalignslop = 0; - args.minlen = args.maxlen = args.prod = 1; - args.wasdel = cur->bc_private.b.flags & XFS_BTCUR_BPRV_WASDEL; - args.firstblock = args.fsbno; - if (args.fsbno == NULLFSBLOCK) { -#ifdef DEBUG - if ((error = xfs_btree_check_lptr_disk(cur, *pp, level))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } -#endif - args.fsbno = be64_to_cpu(*pp); - args.type = XFS_ALLOCTYPE_START_BNO; - } else - args.type = XFS_ALLOCTYPE_NEAR_BNO; - if ((error = xfs_alloc_vextent(&args))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } - if (args.fsbno == NULLFSBLOCK) { - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *stat = 0; - return 0; + if (level == 1) + goto out0; + + block = xfs_bmbt_get_block(cur, level, &cbp); + /* + * Give up if the root has multiple children. + */ + if (be16_to_cpu(block->bb_h.bb_numrecs) != 1) + goto out0; + /* + * Only do this if the next level will fit. + * Then the data must be copied up to the inode, + * instead of freeing the root you free the next level. + */ + cbp = cur->bc_bufs[level - 1]; + cblock = xfs_bmbt_buf_to_block(cur, cbp); + if (be16_to_cpu(cblock->bb_h.bb_numrecs) > xfs_bmbt_get_dmaxrecs(cur, level)) + goto out0; + + ASSERT(be64_to_cpu(cblock->bb_h.bb_leftsib) == NULLDFSBNO); + ASSERT(be64_to_cpu(cblock->bb_h.bb_rightsib) == NULLDFSBNO); + ip = cur->bc_private.b.ip; + ifp = XFS_IFORK_PTR(ip, cur->bc_private.b.whichfork); + ASSERT(xfs_bmbt_get_imaxrecs(cur, level) == + XFS_BMAP_BROOT_MAXRECS(ifp->if_broot_bytes)); + i = (int)(be16_to_cpu(cblock->bb_h.bb_numrecs) - xfs_bmbt_get_imaxrecs(cur, level)); + if (i) { + xfs_iroot_realloc(ip, i, cur->bc_private.b.whichfork); + block = (xfs_btree_block_t *)ifp->if_broot; } - ASSERT(args.len == 1); - cur->bc_private.b.firstblock = args.fsbno; - cur->bc_private.b.allocated++; - cur->bc_private.b.ip->i_d.di_nblocks++; - XFS_TRANS_MOD_DQUOT_BYINO(args.mp, args.tp, cur->bc_private.b.ip, - XFS_TRANS_DQ_BCOUNT, 1L); - bp = xfs_btree_get_bufl(args.mp, cur->bc_tp, args.fsbno, 0); - cblock = XFS_BUF_TO_BMBT_BLOCK(bp); - *cblock = *block; - be16_add(&block->bb_level, 1); - block->bb_numrecs = cpu_to_be16(1); - cur->bc_nlevels++; - cur->bc_ptrs[level + 1] = 1; - kp = XFS_BMAP_KEY_IADDR(block, 1, cur); - ckp = XFS_BMAP_KEY_IADDR(cblock, 1, cur); - memcpy(ckp, kp, be16_to_cpu(cblock->bb_numrecs) * sizeof(*kp)); - cpp = XFS_BMAP_PTR_IADDR(cblock, 1, cur); -#ifdef DEBUG - for (i = 0; i < be16_to_cpu(cblock->bb_numrecs); i++) { - if ((error = xfs_btree_check_lptr_disk(cur, pp[i], level))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); + be16_add(&block->bb_h.bb_numrecs, i); + ASSERT(block->bb_h.bb_numrecs == cblock->bb_h.bb_numrecs); + kp = xfs_bmbt_key_addr(cur, 1, block); + ckp = xfs_bmbt_key_addr(cur, 1, cblock); + memcpy(kp, ckp, be16_to_cpu(block->bb_h.bb_numrecs) * sizeof(xfs_bmbt_key_t)); + pp = xfs_bmbt_ptr_addr(cur, 1, block); + cpp = xfs_bmbt_ptr_addr(cur, 1, cblock); +#ifdef DEBUG + for (i = 0; i < be16_to_cpu(cblock->bb_h.bb_numrecs); i++) { + int error; + error = xfs_btree_check_lptr_disk(cur, cpp, i, level - 1); + if (error) { + XFS_BTREE_TRACE_CURSOR(cur, XBT_ERROR); return error; } } #endif - memcpy(cpp, pp, be16_to_cpu(cblock->bb_numrecs) * sizeof(*pp)); -#ifdef DEBUG - if ((error = xfs_btree_check_lptr(cur, args.fsbno, level))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } -#endif - *pp = cpu_to_be64(args.fsbno); - xfs_iroot_realloc(cur->bc_private.b.ip, 1 - be16_to_cpu(cblock->bb_numrecs), - cur->bc_private.b.whichfork); - xfs_btree_setbuf(cur, level, bp); - /* - * Do all this logging at the end so that - * the root is at the right level. - */ - xfs_bmbt_log_block(cur, bp, XFS_BB_ALL_BITS); - xfs_bmbt_log_keys(cur, bp, 1, be16_to_cpu(cblock->bb_numrecs)); - xfs_bmbt_log_ptrs(cur, bp, 1, be16_to_cpu(cblock->bb_numrecs)); - XFS_BMBT_TRACE_CURSOR(cur, EXIT); - *logflags |= - XFS_ILOG_CORE | XFS_ILOG_FBROOT(cur->bc_private.b.whichfork); - *stat = 1; + memcpy(pp, cpp, be16_to_cpu(block->bb_h.bb_numrecs) * sizeof(xfs_bmbt_ptr_t)); + + xfs_bmbt_free_block(cur, cbp, 1); + cur->bc_bufs[level - 1] = NULL; + be16_add(&block->bb_h.bb_level, -1); + xfs_trans_log_inode(cur->bc_tp, ip, + XFS_ILOG_CORE | XFS_ILOG_FBROOT(cur->bc_private.b.whichfork)); + cur->bc_nlevels--; +out0: + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); return 0; } -/* - * Set all the fields in a bmap extent record from the arguments. - */ -void -xfs_bmbt_set_allf( - xfs_bmbt_rec_host_t *r, - xfs_fileoff_t startoff, - xfs_fsblock_t startblock, - xfs_filblks_t blockcount, - xfs_exntst_t state) +STATIC int +xfs_bmbt_realloc_root( + xfs_btree_cur_t *cur, + int index) { - int extent_flag = (state == XFS_EXT_NORM) ? 0 : 1; - - ASSERT(state == XFS_EXT_NORM || state == XFS_EXT_UNWRITTEN); - ASSERT((startoff & XFS_MASK64HI(64-BMBT_STARTOFF_BITLEN)) == 0); - ASSERT((blockcount & XFS_MASK64HI(64-BMBT_BLOCKCOUNT_BITLEN)) == 0); + xfs_inode_t *ip = cur->bc_private.b.ip; + xfs_iroot_realloc(ip, index, cur->bc_private.b.whichfork); + return 0; +} -#if XFS_BIG_BLKNOS - ASSERT((startblock & XFS_MASK64HI(64-BMBT_STARTBLOCK_BITLEN)) == 0); +STATIC void +xfs_bmbt_update_cursor( + xfs_btree_cur_t *src, + xfs_btree_cur_t *dst) +{ + ASSERT((dst->bc_private.b.firstblock != NULLFSBLOCK) || + (dst->bc_private.b.ip->i_d.di_flags & XFS_DIFLAG_REALTIME)); + ASSERT(dst->bc_private.b.flist == src->bc_private.b.flist); - r->l0 = ((xfs_bmbt_rec_base_t)extent_flag << 63) | - ((xfs_bmbt_rec_base_t)startoff << 9) | - ((xfs_bmbt_rec_base_t)startblock >> 43); - r->l1 = ((xfs_bmbt_rec_base_t)startblock << 21) | - ((xfs_bmbt_rec_base_t)blockcount & - (xfs_bmbt_rec_base_t)XFS_MASK64LO(21)); -#else /* !XFS_BIG_BLKNOS */ - if (ISNULLSTARTBLOCK(startblock)) { - r->l0 = ((xfs_bmbt_rec_base_t)extent_flag << 63) | - ((xfs_bmbt_rec_base_t)startoff << 9) | - (xfs_bmbt_rec_base_t)XFS_MASK64LO(9); - r->l1 = XFS_MASK64HI(11) | - ((xfs_bmbt_rec_base_t)startblock << 21) | - ((xfs_bmbt_rec_base_t)blockcount & - (xfs_bmbt_rec_base_t)XFS_MASK64LO(21)); - } else { - r->l0 = ((xfs_bmbt_rec_base_t)extent_flag << 63) | - ((xfs_bmbt_rec_base_t)startoff << 9); - r->l1 = ((xfs_bmbt_rec_base_t)startblock << 21) | - ((xfs_bmbt_rec_base_t)blockcount & - (xfs_bmbt_rec_base_t)XFS_MASK64LO(21)); - } -#endif /* XFS_BIG_BLKNOS */ + dst->bc_private.b.allocated += src->bc_private.b.allocated; + src->bc_private.b.allocated = 0; + dst->bc_private.b.firstblock = src->bc_private.b.firstblock; } +static const struct xfs_btree_cur_ops xfs_bmbt_curops = { + .new_root = xfs_bmbt_new_root, + .realloc_root = xfs_bmbt_realloc_root, + .kill_root = xfs_bmbt_killroot, + .update_cursor =xfs_bmbt_update_cursor, +}; + +#if defined(XFS_BTREE_TRACE) + /* - * Set all the fields in a bmap extent record from the uncompressed form. + * Global bmbt trace buffer */ -void -xfs_bmbt_set_all( - xfs_bmbt_rec_host_t *r, - xfs_bmbt_irec_t *s) +ktrace_t *xfs_bmbt_trace_buf; +/* + * Add a trace buffer entry for the arguments given to the routine, + * generic form. + */ +STATIC void +xfs_bmbt_trace_enter( + const char *func, + xfs_btree_cur_t *cur, + char *s, + int type, + int line, + __psunsigned_t a0, + __psunsigned_t a1, + __psunsigned_t a2, + __psunsigned_t a3, + __psunsigned_t a4, + __psunsigned_t a5, + __psunsigned_t a6, + __psunsigned_t a7, + __psunsigned_t a8, + __psunsigned_t a9, + __psunsigned_t a10) { - xfs_bmbt_set_allf(r, s->br_startoff, s->br_startblock, - s->br_blockcount, s->br_state); + xfs_inode_t *ip; + int whichfork; + + ip = cur->bc_private.b.ip; + whichfork = cur->bc_private.b.whichfork; + ktrace_enter(xfs_bmbt_trace_buf, + (void *)((__psint_t)type | (whichfork << 8) | (line << 16)), + (void *)func, (void *)s, (void *)ip, (void *)cur, + (void *)a0, (void *)a1, (void *)a2, (void *)a3, + (void *)a4, (void *)a5, (void *)a6, (void *)a7, + (void *)a8, (void *)a9, (void *)a10); + ASSERT(ip->i_btrace); + ktrace_enter(ip->i_btrace, + (void *)((__psint_t)type | (whichfork << 8) | (line << 16)), + (void *)func, (void *)s, (void *)ip, (void *)cur, + (void *)a0, (void *)a1, (void *)a2, (void *)a3, + (void *)a4, (void *)a5, (void *)a6, (void *)a7, + (void *)a8, (void *)a9, (void *)a10); } - -/* - * Set all the fields in a disk format bmap extent record from the arguments. - */ -void -xfs_bmbt_disk_set_allf( - xfs_bmbt_rec_t *r, - xfs_fileoff_t startoff, - xfs_fsblock_t startblock, - xfs_filblks_t blockcount, - xfs_exntst_t state) +STATIC void +xfs_bmbt_trace_cursor( + xfs_btree_cur_t *cur, + __uint32_t *s0, + __uint64_t *l0, + __uint64_t *l1) { - int extent_flag = (state == XFS_EXT_NORM) ? 0 : 1; - - ASSERT(state == XFS_EXT_NORM || state == XFS_EXT_UNWRITTEN); - ASSERT((startoff & XFS_MASK64HI(64-BMBT_STARTOFF_BITLEN)) == 0); - ASSERT((blockcount & XFS_MASK64HI(64-BMBT_BLOCKCOUNT_BITLEN)) == 0); + xfs_bmbt_rec_host_t r; -#if XFS_BIG_BLKNOS - ASSERT((startblock & XFS_MASK64HI(64-BMBT_STARTBLOCK_BITLEN)) == 0); + xfs_bmbt_set_all(&r, &cur->bc_rec.b); - r->l0 = cpu_to_be64( - ((xfs_bmbt_rec_base_t)extent_flag << 63) | - ((xfs_bmbt_rec_base_t)startoff << 9) | - ((xfs_bmbt_rec_base_t)startblock >> 43)); - r->l1 = cpu_to_be64( - ((xfs_bmbt_rec_base_t)startblock << 21) | - ((xfs_bmbt_rec_base_t)blockcount & - (xfs_bmbt_rec_base_t)XFS_MASK64LO(21))); -#else /* !XFS_BIG_BLKNOS */ - if (ISNULLSTARTBLOCK(startblock)) { - r->l0 = cpu_to_be64( - ((xfs_bmbt_rec_base_t)extent_flag << 63) | - ((xfs_bmbt_rec_base_t)startoff << 9) | - (xfs_bmbt_rec_base_t)XFS_MASK64LO(9)); - r->l1 = cpu_to_be64(XFS_MASK64HI(11) | - ((xfs_bmbt_rec_base_t)startblock << 21) | - ((xfs_bmbt_rec_base_t)blockcount & - (xfs_bmbt_rec_base_t)XFS_MASK64LO(21))); - } else { - r->l0 = cpu_to_be64( - ((xfs_bmbt_rec_base_t)extent_flag << 63) | - ((xfs_bmbt_rec_base_t)startoff << 9)); - r->l1 = cpu_to_be64( - ((xfs_bmbt_rec_base_t)startblock << 21) | - ((xfs_bmbt_rec_base_t)blockcount & - (xfs_bmbt_rec_base_t)XFS_MASK64LO(21))); - } -#endif /* XFS_BIG_BLKNOS */ + *s0 = (cur->bc_private.b.flags << 16) | cur->bc_private.b.allocated; + *l0 = r.l0; + *l1 = r.l1; } -/* - * Set all the fields in a bmap extent record from the uncompressed form. - */ -void -xfs_bmbt_disk_set_all( - xfs_bmbt_rec_t *r, - xfs_bmbt_irec_t *s) -{ - xfs_bmbt_disk_set_allf(r, s->br_startoff, s->br_startblock, - s->br_blockcount, s->br_state); -} +STATIC void +xfs_bmbt_trace_record( + xfs_btree_cur_t *cur, + xfs_btree_rec_t *rec, + __uint64_t *l0, + __uint64_t *l1, + __uint64_t *l2) +{ + xfs_bmbt_irec_t s; + + xfs_bmbt_disk_get_all(&rec->u.bmbt, &s); + *l0 = s.br_startoff; + *l1 = s.br_startblock; + *l2 = s.br_blockcount; +} + +static const struct xfs_btree_trc_ops xfs_bmbt_trcops = { + .enter = xfs_bmbt_trace_enter, + .cursor = xfs_bmbt_trace_cursor, + .record = xfs_bmbt_trace_record, +}; +#endif -/* - * Set the blockcount field in a bmap extent record. - */ void -xfs_bmbt_set_blockcount( - xfs_bmbt_rec_host_t *r, - xfs_filblks_t v) +xfs_bmbt_init_cursor( + xfs_btree_cur_t *cur) { - ASSERT((v & XFS_MASK64HI(43)) == 0); - r->l1 = (r->l1 & (xfs_bmbt_rec_base_t)XFS_MASK64HI(43)) | - (xfs_bmbt_rec_base_t)(v & XFS_MASK64LO(21)); + cur->bc_flags = XFS_BTREE_ROOT_IN_INODE; + cur->bc_curops = &xfs_bmbt_curops; + cur->bc_blkops = &xfs_bmbt_blkops; + cur->bc_recops = &xfs_bmbt_recops; +#if defined(XFS_BTREE_TRACE) + cur->bc_trcops = &xfs_bmbt_trcops; +#endif } /* - * Set the startblock field in a bmap extent record. + * BMBT functions that are not covered by core btree code. + * Externally visible routines. */ -void -xfs_bmbt_set_startblock( - xfs_bmbt_rec_host_t *r, - xfs_fsblock_t v) -{ -#if XFS_BIG_BLKNOS - ASSERT((v & XFS_MASK64HI(12)) == 0); - r->l0 = (r->l0 & (xfs_bmbt_rec_base_t)XFS_MASK64HI(55)) | - (xfs_bmbt_rec_base_t)(v >> 43); - r->l1 = (r->l1 & (xfs_bmbt_rec_base_t)XFS_MASK64LO(21)) | - (xfs_bmbt_rec_base_t)(v << 21); -#else /* !XFS_BIG_BLKNOS */ - if (ISNULLSTARTBLOCK(v)) { - r->l0 |= (xfs_bmbt_rec_base_t)XFS_MASK64LO(9); - r->l1 = (xfs_bmbt_rec_base_t)XFS_MASK64HI(11) | - ((xfs_bmbt_rec_base_t)v << 21) | - (r->l1 & (xfs_bmbt_rec_base_t)XFS_MASK64LO(21)); - } else { - r->l0 &= ~(xfs_bmbt_rec_base_t)XFS_MASK64LO(9); - r->l1 = ((xfs_bmbt_rec_base_t)v << 21) | - (r->l1 & (xfs_bmbt_rec_base_t)XFS_MASK64LO(21)); - } -#endif /* XFS_BIG_BLKNOS */ -} /* - * Set the startoff field in a bmap extent record. + * Update the record referred to by cur to the value given + * by [off, bno, len, state]. + * This either works (return 0) or gets an EFSCORRUPTED error. */ -void -xfs_bmbt_set_startoff( - xfs_bmbt_rec_host_t *r, - xfs_fileoff_t v) +int +xfs_bmbt_update( + xfs_btree_cur_t *cur, + xfs_fileoff_t off, + xfs_fsblock_t bno, + xfs_filblks_t len, + xfs_exntst_t state) { - ASSERT((v & XFS_MASK64HI(9)) == 0); - r->l0 = (r->l0 & (xfs_bmbt_rec_base_t) XFS_MASK64HI(1)) | - ((xfs_bmbt_rec_base_t)v << 9) | - (r->l0 & (xfs_bmbt_rec_base_t)XFS_MASK64LO(9)); + xfs_btree_rec_t rec; + + xfs_bmbt_disk_set_allf(&rec.u.bmbt, off, bno, len, state); + return xfs_btree_update(cur, &rec); } /* - * Set the extent state field in a bmap extent record. + * Lookup the record equal to [off, bno, len] in the btree given by cur. */ -void -xfs_bmbt_set_state( - xfs_bmbt_rec_host_t *r, - xfs_exntst_t v) +int /* error */ +xfs_bmbt_lookup_eq( + xfs_btree_cur_t *cur, /* btree cursor */ + xfs_fileoff_t off, + xfs_fsblock_t bno, + xfs_filblks_t len, + int *stat) /* success/failure */ { - ASSERT(v == XFS_EXT_NORM || v == XFS_EXT_UNWRITTEN); - if (v == XFS_EXT_NORM) - r->l0 &= XFS_MASK64LO(64 - BMBT_EXNTFLAG_BITLEN); - else - r->l0 |= XFS_MASK64HI(BMBT_EXNTFLAG_BITLEN); + cur->bc_rec.b.br_startoff = off; + cur->bc_rec.b.br_startblock = bno; + cur->bc_rec.b.br_blockcount = len; + return xfs_btree_lookup(cur, XFS_LOOKUP_EQ, stat); } /* - * Convert in-memory form of btree root to on-disk form. + * Lookup the first record greater than or equal to [off, bno, len] + * in the btree given by cur. */ -void -xfs_bmbt_to_bmdr( - xfs_bmbt_block_t *rblock, - int rblocklen, - xfs_bmdr_block_t *dblock, - int dblocklen) +int /* error */ +xfs_bmbt_lookup_ge( + xfs_btree_cur_t *cur, /* btree cursor */ + xfs_fileoff_t off, + xfs_fsblock_t bno, + xfs_filblks_t len, + int *stat) /* success/failure */ { - int dmxr; - xfs_bmbt_key_t *fkp; - __be64 *fpp; - xfs_bmbt_key_t *tkp; - __be64 *tpp; - - ASSERT(be32_to_cpu(rblock->bb_magic) == XFS_BMAP_MAGIC); - ASSERT(be64_to_cpu(rblock->bb_leftsib) == NULLDFSBNO); - ASSERT(be64_to_cpu(rblock->bb_rightsib) == NULLDFSBNO); - ASSERT(be16_to_cpu(rblock->bb_level) > 0); - dblock->bb_level = rblock->bb_level; - dblock->bb_numrecs = rblock->bb_numrecs; - dmxr = (int)XFS_BTREE_BLOCK_MAXRECS(dblocklen, xfs_bmdr, 0); - fkp = XFS_BMAP_BROOT_KEY_ADDR(rblock, 1, rblocklen); - tkp = XFS_BTREE_KEY_ADDR(xfs_bmdr, dblock, 1); - fpp = XFS_BMAP_BROOT_PTR_ADDR(rblock, 1, rblocklen); - tpp = XFS_BTREE_PTR_ADDR(xfs_bmdr, dblock, 1, dmxr); - dmxr = be16_to_cpu(dblock->bb_numrecs); - memcpy(tkp, fkp, sizeof(*fkp) * dmxr); - memcpy(tpp, fpp, sizeof(*fpp) * dmxr); + cur->bc_rec.b.br_startoff = off; + cur->bc_rec.b.br_startblock = bno; + cur->bc_rec.b.br_blockcount = len; + return xfs_btree_lookup(cur, XFS_LOOKUP_GE, stat); } /* - * Update the record to the passed values. + * Give the bmap btree a new root block. Copy the old broot contents + * down into a real block and make the broot point to it. */ -int -xfs_bmbt_update( - xfs_btree_cur_t *cur, - xfs_fileoff_t off, - xfs_fsblock_t bno, - xfs_filblks_t len, - xfs_exntst_t state) +int /* error */ +xfs_bmbt_newroot( + xfs_btree_cur_t *cur, /* btree cursor */ + int *logflags, /* logging flags for inode */ + int *stat) /* return status - 0 fail */ { - xfs_bmbt_block_t *block; - xfs_buf_t *bp; - int error; - xfs_bmbt_key_t key; - int ptr; - xfs_bmbt_rec_t *rp; - - XFS_BMBT_TRACE_CURSOR(cur, ENTRY); - XFS_BMBT_TRACE_ARGFFFI(cur, (xfs_dfiloff_t)off, (xfs_dfsbno_t)bno, - (xfs_dfilblks_t)len, (int)state); - block = xfs_bmbt_get_block(cur, 0, &bp); + xfs_btree_block_t *block; /* bmap btree block */ + xfs_buf_t *bp; /* buffer for block */ + xfs_btree_block_t *cblock; /* child btree block */ + xfs_btree_key_t *ckp; /* child key pointer */ + xfs_btree_ptr_t *cpp; /* child ptr pointer */ + int error; /* error return code */ #ifdef DEBUG - if ((error = xfs_btree_check_lblock(cur, block, 0, bp))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; - } + int i; /* loop counter */ #endif - ptr = cur->bc_ptrs[0]; - rp = XFS_BMAP_REC_IADDR(block, ptr, cur); - xfs_bmbt_disk_set_allf(rp, off, bno, len, state); - xfs_bmbt_log_recs(cur, bp, ptr, ptr); - if (ptr > 1) { - XFS_BMBT_TRACE_CURSOR(cur, EXIT); + xfs_btree_key_t *kp; /* pointer to bmap btree key */ + int level; /* btree level */ + xfs_btree_ptr_t *pp; /* pointer to bmap block addr */ + xfs_btree_ptr_t nptr; /* pointer to bmap block addr */ + + XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY); + level = cur->bc_nlevels - 1; + block = xfs_bmbt_get_block(cur, level, &bp); + pp = xfs_bmbt_ptr_addr(cur, 1, block); + + /* + * Allocate the new block. + * If we can't do it, we're toast. Give up. + */ + error = xfs_bmbt_alloc_block(cur, pp, &nptr, 1, stat); + if (error) + goto error0; + if (*stat == 0) { + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); return 0; } - key.br_startoff = cpu_to_be64(off); - if ((error = xfs_bmbt_updkey(cur, &key, 1))) { - XFS_BMBT_TRACE_CURSOR(cur, ERROR); - return error; + /* + * Copy the root into a real block. + */ + error = xfs_bmbt_get_buf(cur, &nptr, 0, &bp); + if (error) + goto error0; + cblock = xfs_bmbt_buf_to_block(cur, bp); + *cblock = *block; + be16_add(&block->bb_h.bb_level, 1); + block->bb_h.bb_numrecs = cpu_to_be16(1); + cur->bc_nlevels++; + cur->bc_ptrs[level + 1] = 1; + kp = xfs_bmbt_key_addr(cur, 1, block); + ckp = xfs_bmbt_key_addr(cur, 1, cblock); + memcpy(ckp, kp, be16_to_cpu(cblock->bb_h.bb_numrecs) * sizeof(xfs_bmbt_key_t)); + cpp = xfs_bmbt_ptr_addr(cur, 1, cblock); +#ifdef DEBUG + for (i = 0; i < be16_to_cpu(cblock->bb_h.bb_numrecs); i++) { + error = xfs_btree_check_lptr_disk(cur, pp[i], level); + if (error) + goto error0; } - XFS_BMBT_TRACE_CURSOR(cur, EXIT); +#endif + memcpy(cpp, pp, be16_to_cpu(cblock->bb_h.bb_numrecs) * sizeof(xfs_bmbt_ptr_t)); +#ifdef DEBUG + error = xfs_btree_check_lptr(cur, nptr.u.bmbt, level); + if (error) + goto error0; +#endif + memcpy(pp, &nptr, sizeof(xfs_bmbt_ptr_t)); + xfs_bmbt_realloc_root(cur, 1 - be16_to_cpu(cblock->bb_h.bb_numrecs)); + xfs_btree_setbuf(cur, level, bp); + /* + * Do all this logging at the end so that + * the root is at the right level. + */ + xfs_bmbt_log_block(cur, bp, XFS_BB_ALL_BITS); + xfs_bmbt_log_keys(cur, bp, 1, be16_to_cpu(cblock->bb_h.bb_numrecs)); + xfs_bmbt_log_ptrs(cur, bp, 1, be16_to_cpu(cblock->bb_h.bb_numrecs)); + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + *logflags |= + XFS_ILOG_CORE | XFS_ILOG_FBROOT(cur->bc_private.b.whichfork); + *stat = 1; return 0; +error0: + XFS_BTREE_TRACE_CURSOR(cur, XBT_ERROR); + return error; } -/* - * Check extent records, which have just been read, for - * any bit in the extent flag field. ASSERT on debug - * kernels, as this condition should not occur. - * Return an error condition (1) if any flags found, - * otherwise return 0. - */ - -int -xfs_check_nostate_extents( - xfs_ifork_t *ifp, - xfs_extnum_t idx, - xfs_extnum_t num) -{ - for (; num > 0; num--, idx++) { - xfs_bmbt_rec_host_t *ep = xfs_iext_get_ext(ifp, idx); - if ((ep->l0 >> - (64 - BMBT_EXNTFLAG_BITLEN)) != 0) { - ASSERT(0); - return 1; - } - } - return 0; -} Index: 2.6.x-xfs-new/fs/xfs/xfs_bmap_btree.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_bmap_btree.h 2007-08-02 22:13:10.000000000 +1000 +++ 2.6.x-xfs-new/fs/xfs/xfs_bmap_btree.h 2007-11-06 19:40:29.718673016 +1100 @@ -254,11 +254,7 @@ extern ktrace_t *xfs_bmbt_trace_buf; * Prototypes for xfs_bmap.c to call. */ extern void xfs_bmdr_to_bmbt(xfs_bmdr_block_t *, int, xfs_bmbt_block_t *, int); -extern int xfs_bmbt_decrement(struct xfs_btree_cur *, int, int *); -extern int xfs_bmbt_delete(struct xfs_btree_cur *, int *); extern void xfs_bmbt_get_all(xfs_bmbt_rec_host_t *r, xfs_bmbt_irec_t *s); -extern xfs_bmbt_block_t *xfs_bmbt_get_block(struct xfs_btree_cur *cur, - int, struct xfs_buf **bpp); extern xfs_filblks_t xfs_bmbt_get_blockcount(xfs_bmbt_rec_host_t *r); extern xfs_fsblock_t xfs_bmbt_get_startblock(xfs_bmbt_rec_host_t *r); extern xfs_fileoff_t xfs_bmbt_get_startoff(xfs_bmbt_rec_host_t *r); @@ -268,8 +264,6 @@ extern void xfs_bmbt_disk_get_all(xfs_bm extern xfs_filblks_t xfs_bmbt_disk_get_blockcount(xfs_bmbt_rec_t *r); extern xfs_fileoff_t xfs_bmbt_disk_get_startoff(xfs_bmbt_rec_t *r); -extern int xfs_bmbt_increment(struct xfs_btree_cur *, int, int *); -extern int xfs_bmbt_insert(struct xfs_btree_cur *, int *); extern void xfs_bmbt_log_block(struct xfs_btree_cur *, struct xfs_buf *, int); extern void xfs_bmbt_log_recs(struct xfs_btree_cur *, struct xfs_buf *, int, int); @@ -299,6 +293,8 @@ extern void xfs_bmbt_disk_set_allf(xfs_b extern void xfs_bmbt_to_bmdr(xfs_bmbt_block_t *, int, xfs_bmdr_block_t *, int); extern int xfs_bmbt_update(struct xfs_btree_cur *, xfs_fileoff_t, xfs_fsblock_t, xfs_filblks_t, xfs_exntst_t); +extern void xfs_bmbt_init_cursor(struct xfs_btree_cur *cur); + #endif /* __KERNEL__ */ Index: 2.6.x-xfs-new/fs/xfs/xfs_btree.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_btree.c 2007-08-24 22:24:45.000000000 +1000 +++ 2.6.x-xfs-new/fs/xfs/xfs_btree.c 2007-11-06 19:40:29.750668896 +1100 @@ -52,19 +52,7 @@ const __uint32_t xfs_magics[XFS_BTNUM_MA }; /* - * Prototypes for internal routines. - */ - -/* - * Checking routine: return maxrecs for the block. - */ -STATIC int /* number of records fitting in block */ -xfs_btree_maxrecs( - xfs_btree_cur_t *cur, /* btree cursor */ - xfs_btree_block_t *block);/* generic btree block pointer */ - -/* - * Internal routines. + * Internal prototypes */ /* @@ -75,7 +63,7 @@ STATIC xfs_btree_block_t * /* generic xfs_btree_get_block( xfs_btree_cur_t *cur, /* btree cursor */ int level, /* level in btree */ - struct xfs_buf **bpp); /* buffer containing the block */ + xfs_buf_t **bpp); /* buffer containing the block */ /* * Checking routine: return maxrecs for the block. @@ -177,65 +165,7 @@ xfs_btree_check_key( ASSERT(0); } } -#endif /* DEBUG */ - -/* - * Checking routine: check that long form block header is ok. - */ -/* ARGSUSED */ -int /* error (0 or EFSCORRUPTED) */ -xfs_btree_check_lblock( - xfs_btree_cur_t *cur, /* btree cursor */ - xfs_btree_lblock_t *block, /* btree long form block pointer */ - int level, /* level of the btree block */ - xfs_buf_t *bp) /* buffer for block, if any */ -{ - int lblock_ok; /* block passes checks */ - xfs_mount_t *mp; /* file system mount point */ - - mp = cur->bc_mp; - lblock_ok = - be32_to_cpu(block->bb_magic) == xfs_magics[cur->bc_btnum] && - be16_to_cpu(block->bb_level) == level && - be16_to_cpu(block->bb_numrecs) <= - xfs_btree_maxrecs(cur, (xfs_btree_block_t *)block) && - block->bb_leftsib && - (be64_to_cpu(block->bb_leftsib) == NULLDFSBNO || - XFS_FSB_SANITY_CHECK(mp, be64_to_cpu(block->bb_leftsib))) && - block->bb_rightsib && - (be64_to_cpu(block->bb_rightsib) == NULLDFSBNO || - XFS_FSB_SANITY_CHECK(mp, be64_to_cpu(block->bb_rightsib))); - if (unlikely(XFS_TEST_ERROR(!lblock_ok, mp, XFS_ERRTAG_BTREE_CHECK_LBLOCK, - XFS_RANDOM_BTREE_CHECK_LBLOCK))) { - if (bp) - xfs_buftrace("LBTREE ERROR", bp); - XFS_ERROR_REPORT("xfs_btree_check_lblock", XFS_ERRLEVEL_LOW, - mp); - return XFS_ERROR(EFSCORRUPTED); - } - return 0; -} - -/* - * Checking routine: check that (long) pointer is ok. - */ -int /* error (0 or EFSCORRUPTED) */ -xfs_btree_check_lptr( - xfs_btree_cur_t *cur, /* btree cursor */ - xfs_dfsbno_t ptr, /* btree block disk address */ - int level) /* btree block level */ -{ - xfs_mount_t *mp; /* file system mount point */ - - mp = cur->bc_mp; - XFS_WANT_CORRUPTED_RETURN( - level > 0 && - ptr != NULLDFSBNO && - XFS_FSB_SANITY_CHECK(mp, ptr)); - return 0; -} -#ifdef DEBUG /* * Debug routine: check that records are in the right order. */ @@ -296,13 +226,73 @@ xfs_btree_check_rec( #endif /* DEBUG */ /* + * Checking routine: check that long form block header is ok. + */ +/* ARGSUSED */ +int /* error (0 or EFSCORRUPTED) */ +xfs_btree_check_lblock( + xfs_btree_cur_t *cur, /* btree cursor */ + xfs_btree_block_t *block, /* btree long form block pointer */ + int level, /* level of the btree block */ + xfs_buf_t *bp) /* buffer for block, if any */ +{ + int lblock_ok; /* block passes checks */ + xfs_mount_t *mp; /* file system mount point */ + xfs_btree_lblock_t *lb; /* btree long form block pointer */ + + mp = cur->bc_mp; + lb = (xfs_btree_lblock_t *)block; + lblock_ok = + be32_to_cpu(lb->bb_magic) == xfs_magics[cur->bc_btnum] && + be16_to_cpu(lb->bb_level) == level && + be16_to_cpu(lb->bb_numrecs) <= xfs_btree_maxrecs(cur, block) && + lb->bb_leftsib && + (be64_to_cpu(lb->bb_leftsib) == NULLDFSBNO || + XFS_FSB_SANITY_CHECK(mp, be64_to_cpu(lb->bb_leftsib))) && + lb->bb_rightsib && + (be64_to_cpu(lb->bb_rightsib) == NULLDFSBNO || + XFS_FSB_SANITY_CHECK(mp, be64_to_cpu(lb->bb_rightsib))); + if (unlikely(XFS_TEST_ERROR(!lblock_ok, mp, XFS_ERRTAG_BTREE_CHECK_LBLOCK, + XFS_RANDOM_BTREE_CHECK_LBLOCK))) { + if (bp) + xfs_buftrace("LBTREE ERROR", bp); + XFS_ERROR_REPORT("xfs_btree_check_lblock", XFS_ERRLEVEL_LOW, + mp); + return XFS_ERROR(EFSCORRUPTED); + } + return 0; +} + +/* + * Checking routine: check that (long) pointer is ok. + */ +int /* error (0 or EFSCORRUPTED) */ +xfs_btree_check_lptr( + xfs_btree_cur_t *cur, /* btree cursor */ + xfs_btree_ptr_t *ptr, /* btree block disk address */ + int index, /* offset from ptr */ + int level) /* btree block level */ +{ + xfs_mount_t *mp; /* file system mount point */ + xfs_fsblock_t bno; + + mp = cur->bc_mp; + bno = be64_to_cpu((&ptr->u.l)[index]); + XFS_WANT_CORRUPTED_RETURN(level > 0 && + bno != NULLDFSBNO && + XFS_FSB_SANITY_CHECK(mp, bno)); + return 0; +} + + +/* * Checking routine: check that block header is ok. */ /* ARGSUSED */ int /* error (0 or EFSCORRUPTED) */ xfs_btree_check_sblock( xfs_btree_cur_t *cur, /* btree cursor */ - xfs_btree_sblock_t *block, /* btree short form block pointer */ + xfs_btree_block_t *block, /* btree short form block pointer */ int level, /* level of the btree block */ xfs_buf_t *bp) /* buffer containing block */ { @@ -310,21 +300,22 @@ xfs_btree_check_sblock( xfs_agf_t *agf; /* ag. freespace structure */ xfs_agblock_t agflen; /* native ag. freespace length */ int sblock_ok; /* block passes checks */ + xfs_btree_sblock_t *sb; /* btree short form block pointer */ agbp = cur->bc_private.a.agbp; agf = XFS_BUF_TO_AGF(agbp); agflen = be32_to_cpu(agf->agf_length); + sb = (xfs_btree_sblock_t *)block; sblock_ok = - be32_to_cpu(block->bb_magic) == xfs_magics[cur->bc_btnum] && - be16_to_cpu(block->bb_level) == level && - be16_to_cpu(block->bb_numrecs) <= - xfs_btree_maxrecs(cur, (xfs_btree_block_t *)block) && - (be32_to_cpu(block->bb_leftsib) == NULLAGBLOCK || - be32_to_cpu(block->bb_leftsib) < agflen) && - block->bb_leftsib && - (be32_to_cpu(block->bb_rightsib) == NULLAGBLOCK || - be32_to_cpu(block->bb_rightsib) < agflen) && - block->bb_rightsib; + be32_to_cpu(sb->bb_magic) == xfs_magics[cur->bc_btnum] && + be16_to_cpu(sb->bb_level) == level && + be16_to_cpu(sb->bb_numrecs) <= xfs_btree_maxrecs(cur, block) && + (be32_to_cpu(sb->bb_leftsib) == NULLAGBLOCK || + be32_to_cpu(sb->bb_leftsib) < agflen) && + sb->bb_leftsib && + (be32_to_cpu(sb->bb_rightsib) == NULLAGBLOCK || + be32_to_cpu(sb->bb_rightsib) < agflen) && + sb->bb_rightsib; if (unlikely(XFS_TEST_ERROR(!sblock_ok, cur->bc_mp, XFS_ERRTAG_BTREE_CHECK_SBLOCK, XFS_RANDOM_BTREE_CHECK_SBLOCK))) { @@ -343,22 +334,105 @@ xfs_btree_check_sblock( int /* error (0 or EFSCORRUPTED) */ xfs_btree_check_sptr( xfs_btree_cur_t *cur, /* btree cursor */ - xfs_agblock_t ptr, /* btree block disk address */ + xfs_btree_ptr_t *ptr, /* btree block disk address */ + int index, /* offset from ptr to check */ int level) /* btree block level */ { xfs_buf_t *agbp; /* buffer for ag. freespace struct */ xfs_agf_t *agf; /* ag. freespace structure */ + xfs_agblock_t bno; agbp = cur->bc_private.a.agbp; agf = XFS_BUF_TO_AGF(agbp); - XFS_WANT_CORRUPTED_RETURN( - level > 0 && - ptr != NULLAGBLOCK && ptr != 0 && - ptr < be32_to_cpu(agf->agf_length)); + bno = be32_to_cpu((&ptr->u.s)[index]); + XFS_WANT_CORRUPTED_RETURN(level > 0 && bno != NULLAGBLOCK && + bno != 0 && bno < be32_to_cpu(agf->agf_length)); return 0; } /* + * Get/set/init sibling pointers + */ +void +xfs_btree_get_lsibling( + xfs_btree_cur_t *cur, + xfs_btree_block_t *block, + xfs_btree_ptr_t *ptr, + int lr) +{ + if (lr == XFS_BB_RIGHTSIB) { + ptr->u.l = block->bb_u.l.bb_rightsib; + } else { + ASSERT(lr == XFS_BB_LEFTSIB); + ptr->u.l = block->bb_u.l.bb_leftsib; + } + +} + +void +xfs_btree_set_lsibling( + xfs_btree_cur_t *cur, + xfs_btree_block_t *block, + xfs_btree_ptr_t *ptr, + int lr) +{ + if (lr == XFS_BB_RIGHTSIB) { + block->bb_u.l.bb_rightsib = ptr->u.l; + } else { + ASSERT(sibling == XFS_BB_LEFTSIB); + block->bb_u.l.bb_leftsib = ptr->u.l; + } + +} + +void +xfs_btree_get_ssibling( + xfs_btree_cur_t *cur, + xfs_btree_block_t *block, + xfs_btree_ptr_t *ptr, + int lr) +{ + if (lr == XFS_BB_RIGHTSIB) { + ptr->u.s = block->bb_u.s.bb_rightsib; + } else { + ASSERT(lr == XFS_BB_LEFTSIB); + ptr->u.s = block->bb_u.s.bb_leftsib; + } + +} + +void +xfs_btree_set_ssibling( + xfs_btree_cur_t *cur, + xfs_btree_block_t *block, + xfs_btree_ptr_t *ptr, + int lr) +{ + if (lr == XFS_BB_RIGHTSIB) { + block->bb_u.s.bb_rightsib = ptr->u.s; + } else { + ASSERT(sibling == XFS_BB_LEFTSIB); + block->bb_u.s.bb_leftsib = ptr->u.s; + } + +} + +/* set up block header and records for new block in split */ +void +xfs_btree_init_sibling( + xfs_btree_cur_t *cur, + xfs_btree_block_t *new, + xfs_btree_block_t *sib) /* sibling block next to new block */ +{ + /* + * Fill in the btree header for the new block. + */ + new->bb_h.bb_magic = cpu_to_be32(XFS_BMAP_MAGIC); + new->bb_h.bb_level = sib->bb_h.bb_level; + new->bb_h.bb_numrecs = 0; +} + +/* * Delete the btree cursor. */ void @@ -625,6 +699,7 @@ xfs_btree_init_cursor( */ cur->bc_private.a.agbp = agbp; cur->bc_private.a.agno = agno; + xfs_alloc_init_cursor(cur); break; case XFS_BTNUM_BMAP: /* @@ -637,6 +712,7 @@ xfs_btree_init_cursor( cur->bc_private.b.allocated = 0; cur->bc_private.b.flags = 0; cur->bc_private.b.whichfork = whichfork; + xfs_bmbt_init_cursor(cur); break; case XFS_BTNUM_INO: /* @@ -644,6 +720,7 @@ xfs_btree_init_cursor( */ cur->bc_private.i.agbp = agbp; cur->bc_private.i.agno = agno; + xfs_inobt_init_cursor(cur); break; default: ASSERT(0); @@ -848,60 +925,70 @@ xfs_btree_reada_bufs( * Read-ahead btree blocks, at the given level. * Bits in lr are set from XFS_BTCUR_{LEFT,RIGHT}RA. */ +STATIC int +xfs_btree_reada_cores( + xfs_btree_cur_t *cur, /* btree cursor */ + int lr, + xfs_agblock_t left, + xfs_agblock_t right) +{ + int rval = 0; + + if ((lr & XFS_BTCUR_LEFTRA) && (left != NULLAGBLOCK)) { + xfs_btree_reada_bufs(cur->bc_mp, + cur->bc_private.a.agno, left, 1); + rval++; + } + if ((lr & XFS_BTCUR_RIGHTRA) && (right != NULLAGBLOCK)) { + xfs_btree_reada_bufs(cur->bc_mp, + cur->bc_private.a.agno, right, 1); + rval++; + } + return rval; +} + +STATIC int +xfs_btree_reada_corel( + xfs_btree_cur_t *cur, /* btree cursor */ + int lr, + xfs_fsblock_t left, + xfs_fsblock_t right) +{ + int rval = 0; + + if ((lr & XFS_BTCUR_LEFTRA) && (left != NULLDFSBNO)) { + xfs_btree_reada_bufl(cur->bc_mp, left, 1); + rval++; + } + if ((lr & XFS_BTCUR_RIGHTRA) && (right != NULLDFSBNO)) { + xfs_btree_reada_bufl(cur->bc_mp, right, 1); + rval++; + } + return rval; +} + int xfs_btree_readahead_core( xfs_btree_cur_t *cur, /* btree cursor */ int lev, /* level in btree */ int lr) /* left/right bits */ { - xfs_alloc_block_t *a; - xfs_bmbt_block_t *b; - xfs_inobt_block_t *i; int rval = 0; ASSERT(cur->bc_bufs[lev] != NULL); cur->bc_ra[lev] |= lr; - switch (cur->bc_btnum) { - case XFS_BTNUM_BNO: - case XFS_BTNUM_CNT: - a = XFS_BUF_TO_ALLOC_BLOCK(cur->bc_bufs[lev]); - if ((lr & XFS_BTCUR_LEFTRA) && be32_to_cpu(a->bb_leftsib) != NULLAGBLOCK) { - xfs_btree_reada_bufs(cur->bc_mp, cur->bc_private.a.agno, - be32_to_cpu(a->bb_leftsib), 1); - rval++; - } - if ((lr & XFS_BTCUR_RIGHTRA) && be32_to_cpu(a->bb_rightsib) != NULLAGBLOCK) { - xfs_btree_reada_bufs(cur->bc_mp, cur->bc_private.a.agno, - be32_to_cpu(a->bb_rightsib), 1); - rval++; - } - break; - case XFS_BTNUM_BMAP: - b = XFS_BUF_TO_BMBT_BLOCK(cur->bc_bufs[lev]); - if ((lr & XFS_BTCUR_LEFTRA) && be64_to_cpu(b->bb_leftsib) != NULLDFSBNO) { - xfs_btree_reada_bufl(cur->bc_mp, be64_to_cpu(b->bb_leftsib), 1); - rval++; - } - if ((lr & XFS_BTCUR_RIGHTRA) && be64_to_cpu(b->bb_rightsib) != NULLDFSBNO) { - xfs_btree_reada_bufl(cur->bc_mp, be64_to_cpu(b->bb_rightsib), 1); - rval++; - } - break; - case XFS_BTNUM_INO: - i = XFS_BUF_TO_INOBT_BLOCK(cur->bc_bufs[lev]); - if ((lr & XFS_BTCUR_LEFTRA) && be32_to_cpu(i->bb_leftsib) != NULLAGBLOCK) { - xfs_btree_reada_bufs(cur->bc_mp, cur->bc_private.i.agno, - be32_to_cpu(i->bb_leftsib), 1); - rval++; - } - if ((lr & XFS_BTCUR_RIGHTRA) && be32_to_cpu(i->bb_rightsib) != NULLAGBLOCK) { - xfs_btree_reada_bufs(cur->bc_mp, cur->bc_private.i.agno, - be32_to_cpu(i->bb_rightsib), 1); - rval++; - } - break; - default: - ASSERT(0); + if (XFS_BTREE_LONG_PTRS(cur->bc_btnum)) { + xfs_btree_lblock_t *b; + b = XFS_BUF_TO_LBLOCK(cur->bc_bufs[lev]); + rval = xfs_btree_reada_corel(cur, lr, + be64_to_cpu(b->bb_leftsib), + be64_to_cpu(b->bb_rightsib)); + } else { + xfs_btree_sblock_t *b; + b = XFS_BUF_TO_SBLOCK(cur->bc_bufs[lev]); + rval = xfs_btree_reada_cores(cur, lr, + be32_to_cpu(b->bb_leftsib), + be32_to_cpu(b->bb_rightsib)); } return rval; } Index: 2.6.x-xfs-new/fs/xfs/xfs_btree.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_btree.h 2007-11-02 13:44:45.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_btree.h 2007-11-06 19:40:29.750668896 +1100 @@ -85,6 +85,43 @@ typedef struct xfs_btree_block { } xfs_btree_block_t; /* + * Generic block, key, ptr and record wrapper structures + * These are disk format structures, and are converted where + * necessary be the btree specific code that needs to interpret + * them. + */ +typedef struct xfs_btree_key { + union { + xfs_bmbt_key_t bmbt; + xfs_bmdr_key_t bmbr; /* bmbt root block */ + xfs_alloc_key_t alloc; + xfs_inobt_key_t inobt; + __be32 s; /* short form key */ + __be64 l; /* long form key */ + } u; +} xfs_btree_key_t; + +typedef struct xfs_btree_ptr { + union { + xfs_bmbt_ptr_t bmbt; + xfs_bmdr_ptr_t bmbr; /* bmbt root block */ + xfs_alloc_ptr_t alloc; + xfs_inobt_ptr_t inobt; + __be32 s; /* short form ptr */ + __be64 l; /* long form ptr */ + } u; +} xfs_btree_ptr_t; + +typedef struct xfs_btree_rec { + union { + xfs_bmbt_rec_t bmbt; + xfs_bmdr_rec_t bmbr; /* bmbt root block */ + xfs_alloc_rec_t alloc; + xfs_inobt_rec_t inobt; + } u; +} xfs_btree_rec_t; + +/* * For logging record fields. */ #define XFS_BB_MAGIC 0x01 @@ -136,6 +173,183 @@ extern const __uint32_t xfs_magics[]; #define XFS_BTREE_MAXLEVELS 8 /* max of all btrees */ +typedef const struct xfs_btree_cur_ops { + int (*new_root)(struct xfs_btree_cur *cur, int *stat); + int (*realloc_root)(struct xfs_btree_cur *cur, int index); + int (*kill_root)(struct xfs_btree_cur *cur, int level, + xfs_btree_ptr_t *nptr); + void (*set_root)(struct xfs_btree_cur *cur, + xfs_btree_ptr_t *nptr, int level_change); + int (*update_lastrec)(struct xfs_btree_cur *cur, + xfs_btree_block_t *block); + void (*update_cursor)(struct xfs_btree_cur *src, + struct xfs_btree_cur *dst); +} xfs_btree_curops_t; + +typedef const struct xfs_btree_block_ops { + int (*get_buf)(struct xfs_btree_cur *cur, xfs_btree_ptr_t *ptr, + int flags, struct xfs_buf **bpp); + int (*read_buf)(struct xfs_btree_cur *cur, xfs_btree_ptr_t *ptr, + int flags, struct xfs_buf **bpp); + int (*check_block)(struct xfs_btree_cur *cur, + xfs_btree_block_t *block, + int level, struct xfs_buf *bp); + xfs_btree_block_t * + (*get_block)(struct xfs_btree_cur *cur, int lvl, + struct xfs_buf **bpp); + xfs_btree_block_t * + (*buf_to_block)(struct xfs_btree_cur *cur, struct xfs_buf *bp); + void (*buf_to_ptr)(struct xfs_btree_cur *cur, struct xfs_buf *bp, + xfs_btree_ptr_t *ptr); + void (*log_block)(struct xfs_btree_cur *cur, struct xfs_buf *bp, + int fields); + + int (*alloc_block)(struct xfs_btree_cur *cur, xfs_btree_ptr_t *sbno, + xfs_btree_ptr_t *nbno, int length, int *stat); + int (*free_block)(struct xfs_btree_cur *cur, struct xfs_buf *bp, + int length); + + void (*get_sibling)(struct xfs_btree_cur *cur, + xfs_btree_block_t *block, + xfs_btree_ptr_t *ptr, int lr); + void (*set_sibling)(struct xfs_btree_cur *cur, + xfs_btree_block_t *block, + xfs_btree_ptr_t *ptr, int lr); + void (*init_sibling)(struct xfs_btree_cur *cur, + xfs_btree_block_t *nsib, xfs_btree_block_t *sib); +} xfs_btree_blkops_t; + +typedef const struct xfs_btree_record_ops { + /* records in block/level */ + int (*get_minrecs)(struct xfs_btree_cur *cur, int level); + int (*get_maxrecs)(struct xfs_btree_cur *cur, int level); + int (*get_dminrecs)(struct xfs_btree_cur *cur, int level); + int (*get_dmaxrecs)(struct xfs_btree_cur *cur, int level); + int (*get_numrecs)(struct xfs_btree_cur *cur, + xfs_btree_block_t *block); + void (*set_numrecs)(struct xfs_btree_cur *cur, + xfs_btree_block_t *block, + int numrecs); + + /* init values of btree structures */ + void (*init_key_from_rec)(struct xfs_btree_cur *cur, + xfs_btree_key_t *key, xfs_btree_rec_t *rec); + void (*init_ptr_from_cur)(struct xfs_btree_cur *cur, + xfs_btree_ptr_t *ptr); + void (*init_rec_from_key)(struct xfs_btree_cur *cur, + xfs_btree_key_t *key, xfs_btree_rec_t *rec); + void (*init_rec_from_cur)(struct xfs_btree_cur *cur, + xfs_btree_rec_t *rec); + + /* return address of btree structures */ + xfs_btree_key_t * + (*key_addr)(struct xfs_btree_cur *cur, int index, + xfs_btree_block_t *block); + xfs_btree_ptr_t * + (*ptr_addr)(struct xfs_btree_cur *cur, int index, + xfs_btree_block_t *block); + xfs_btree_rec_t * + (*rec_addr)(struct xfs_btree_cur *cur, int index, + xfs_btree_block_t *block); + + /* difference between key value and cursor value */ + int64_t (*key_diff)(struct xfs_btree_cur *cur, xfs_btree_key_t *key); + + xfs_daddr_t + (*ptr_to_daddr)(struct xfs_btree_cur *cur, + xfs_btree_ptr_t *ptr); + + /* set values of btree structures */ + void (*set_key)(struct xfs_btree_cur *cur, + xfs_btree_key_t *key_addr, int index, + xfs_btree_key_t *newkey); + void (*set_ptr)(struct xfs_btree_cur *cur, + xfs_btree_ptr_t *ptr_addr, int index, + xfs_btree_ptr_t *newptr); + void (*set_rec)(struct xfs_btree_cur *cur, + xfs_btree_rec_t *rec_addr, int index, + xfs_btree_rec_t *newrec); + + /* move bits of btree blocks around */ + void (*move_keys)(struct xfs_btree_cur *cur, + xfs_btree_key_t *src_key, + xfs_btree_key_t *dst_key, int src_index, + int dst_index, int numkeys); + void (*move_ptrs)(struct xfs_btree_cur *cur, + xfs_btree_ptr_t *src_ptr, + xfs_btree_ptr_t *dst_ptr, int src_index, + int dst_index, int numptrs); + void (*move_recs)(struct xfs_btree_cur *cur, + xfs_btree_rec_t *src_rec, + xfs_btree_rec_t *dst_rec, int src_index, + int dst_index, int numrecs); + + /* log changes to btree structures */ + void (*log_keys)(struct xfs_btree_cur *cur, struct xfs_buf *bp, + int first, int last); + void (*log_ptrs)(struct xfs_btree_cur *cur, struct xfs_buf *bp, + int first, int last); + void (*log_recs)(struct xfs_btree_cur *cur, struct xfs_buf *bp, + int first, int last); + + /* paranoia */ + int (*check_ptrs)(struct xfs_btree_cur *cur, + xfs_btree_ptr_t *ptr, int index, int level); +} xfs_btree_recops_t; + +#ifdef XFS_BTREE_TRACE +typedef const struct xfs_btree_trc_ops { + void (*enter)(const char *func, xfs_btree_cur_t *cur, + char *s, int type, int line, + __psunsigned_t a0, __psunsigned_t a1, + __psunsigned_t a2, __psunsigned_t a3, + __psunsigned_t a4, __psunsigned_t a5, + __psunsigned_t a6, __psunsigned_t a7, + __psunsigned_t a8, __psunsigned_t a9, + __psunsigned_t a10); + void (*cursor)(xfs_btree_cur_t *cur, __uint32_t *s0, + __uint64_t *l0, __uint64_t *l1); + void (*record)(xfs_btree_cur_t *cur, xfs_btree_rec_t *rec, + __uint64_t *l0, __uint64_t *l1, + __uint64_t *l2); +} xfs_btree_trcops_t; + +#define XBT_ENTRY 1 +#define XBT_EXIT 2 +#define XBT_ERROR 3 +#define XBT_ARGS 4 + +/* + * Trace hooks. + * i,j = integer (32 bit) + * b = btree block buffer (xfs_buf_t) + * p = btree ptr + * r = btree record + * k = btree key + */ +#define XFS_BTREE_TRACE_ARGI(c,i) \ + xfs_btree_trace_argi(__FUNCTION__, c, i, __LINE__) +#define XFS_BTREE_TRACE_ARGBI(c,b,i) \ + xfs_btree_trace_argbi(__FUNCTION__, c, b, i, __LINE__) +#define XFS_BTREE_TRACE_ARGBII(c,b,i,j) \ + xfs_btree_trace_argbii(__FUNCTION__, c, b, i, j, __LINE__) +#define XFS_BTREE_TRACE_ARGIPK(c,i,p,s) \ + xfs_btree_trace_argifk(__FUNCTION__, c, i, p, s, __LINE__) +#define XFS_BTREE_TRACE_ARGIPR(c,i,p,r) \ + xfs_btree_trace_argifr(__FUNCTION__, c, i, p, r, __LINE__) +#define XFS_BTREE_TRACE_ARGIK(c,i,k) \ + xfs_btree_trace_argik(__FUNCTION__, c, i, k, __LINE__) +#define XFS_BTREE_TRACE_CURSOR(c,s) \ + xfs_btree_trace_cursor(__FUNCTION__, c, s, __LINE__) +#else +#define XFS_BTREE_TRACE_ARGBI(c,b,i) +#define XFS_BTREE_TRACE_ARGBII(c,b,i,j) +#define XFS_BTREE_TRACE_ARGI(c,i) +#define XFS_BTREE_TRACE_ARGIPK(c,i,p,s) +#define XFS_BTREE_TRACE_ARGIPR(c,i,p,r) +#define XFS_BTREE_TRACE_ARGIK(c,i,k) +#define XFS_BTREE_TRACE_CURSOR(c,s) +#endif /* XFS_BTREE_TRACE */ /* * Btree cursor structure. * This collects all information needed by the btree code in one place. @@ -144,6 +358,13 @@ typedef struct xfs_btree_cur { struct xfs_trans *bc_tp; /* transaction we're in, if any */ struct xfs_mount *bc_mp; /* file system mount struct */ + xfs_btree_curops_t *bc_curops; + xfs_btree_blkops_t *bc_blkops; + xfs_btree_recops_t *bc_recops; +#ifdef XFS_BTREE_TRACE + xfs_btree_trcops_t *bc_trcops; +#endif + uint bc_flags; /* btree features - below */ union { xfs_alloc_rec_incore_t a; xfs_bmbt_irec_t b; @@ -179,6 +400,10 @@ typedef struct xfs_btree_cur } bc_private; /* per-btree type data */ } xfs_btree_cur_t; +/* cursor flags */ +#define XFS_BTREE_ROOT_IN_INODE (1<<0) /* root may be variable size */ +#define XFS_BTREE_LASTREC_UPDATE (1<<1) /* track last rec externally */ + #define XFS_BTREE_NOERROR 0 #define XFS_BTREE_ERROR 1 @@ -192,6 +417,17 @@ typedef struct xfs_btree_cur #ifdef __KERNEL__ +#define XFS_BTREE_TRACE_ARGBI(c,b,i) +#define XFS_BTREE_TRACE_ARGBII(c,b,i,j) +#define XFS_BTREE_TRACE_ARGFFF(c,o,b,i) +#define XFS_BTREE_TRACE_ARGFFFI(c,o,b,i,j) +#define XFS_BTREE_TRACE_ARGI(c,i) +#define XFS_BTREE_TRACE_ARGII(c,i,j) +#define XFS_BTREE_TRACE_ARGIFK(c,i,f,s) +#define XFS_BTREE_TRACE_ARGIFR(c,i,f,r) +#define XFS_BTREE_TRACE_ARGIK(c,i,k) +#define XFS_BTREE_TRACE_CURSOR(c,s) + #ifdef DEBUG /* * Debug routine: check that block header is ok. @@ -232,7 +468,7 @@ xfs_btree_check_rec( int /* error (0 or EFSCORRUPTED) */ xfs_btree_check_lblock( xfs_btree_cur_t *cur, /* btree cursor */ - xfs_btree_lblock_t *block, /* btree long form block pointer */ + xfs_btree_block_t *block, /* btree long form block pointer */ int level, /* level of the btree block */ struct xfs_buf *bp); /* buffer containing block, if any */ @@ -242,19 +478,17 @@ xfs_btree_check_lblock( int /* error (0 or EFSCORRUPTED) */ xfs_btree_check_lptr( xfs_btree_cur_t *cur, /* btree cursor */ - xfs_dfsbno_t ptr, /* btree block disk address */ + xfs_btree_ptr_t *ptr, /* btree block ptr */ + int offset, /* offset from ptr to check */ int level); /* btree block level */ -#define xfs_btree_check_lptr_disk(cur, ptr, level) \ - xfs_btree_check_lptr(cur, be64_to_cpu(ptr), level) - /* * Checking routine: check that short form block header is ok. */ int /* error (0 or EFSCORRUPTED) */ xfs_btree_check_sblock( xfs_btree_cur_t *cur, /* btree cursor */ - xfs_btree_sblock_t *block, /* btree short form block pointer */ + xfs_btree_block_t *block, /* btree short form block pointer */ int level, /* level of the btree block */ struct xfs_buf *bp); /* buffer containing block */ @@ -264,7 +498,8 @@ xfs_btree_check_sblock( int /* error (0 or EFSCORRUPTED) */ xfs_btree_check_sptr( xfs_btree_cur_t *cur, /* btree cursor */ - xfs_agblock_t ptr, /* btree block disk address */ + xfs_btree_ptr_t *ptr, /* btree block ptr */ + int offset, /* offset from ptr to check */ int level); /* btree block level */ /* @@ -423,12 +658,52 @@ xfs_btree_readahead( int lev, /* level in btree */ int lr) /* left/right bits */ { + if ((cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) && + (lev == cur->bc_nlevels - 1)) + return 0; + if ((cur->bc_ra[lev] | lr) == cur->bc_ra[lev]) return 0; return xfs_btree_readahead_core(cur, lev, lr); } +/* + * Block sibling operations. + */ +void +xfs_btree_get_lsibling( + xfs_btree_cur_t *cur, + xfs_btree_block_t *block, + xfs_btree_ptr_t *ptr, + int lr); + +void +xfs_btree_set_lsibling( + xfs_btree_cur_t *cur, + xfs_btree_block_t *block, + xfs_btree_ptr_t *ptr, + int lr); + +void +xfs_btree_get_ssibling( + xfs_btree_cur_t *cur, + xfs_btree_block_t *block, + xfs_btree_ptr_t *ptr, + int lr); + +void +xfs_btree_set_ssibling( + xfs_btree_cur_t *cur, + xfs_btree_block_t *block, + xfs_btree_ptr_t *ptr, + int lr); + +void +xfs_btree_init_sibling( + xfs_btree_cur_t *cur, + xfs_btree_block_t *nsib, + xfs_btree_block_t *sib); /* sibling block next to new block */ /* * Set the buffer for level "lev" in the cursor to bp, releasing @@ -440,6 +715,136 @@ xfs_btree_setbuf( int lev, /* level in btree */ struct xfs_buf *bp); /* new buffer to set */ +/* + * Core btree functions + */ + +/* + * Insert one record/level. Return information to the caller + * allowing the next level up to proceed if necessary. + */ +int xfs_btree_insrec( + xfs_btree_cur_t *cur, + int level, + xfs_btree_ptr_t *ptrp, + xfs_btree_rec_t *recp, + xfs_btree_cur_t **curp, + int *stat); /* no-go/done/continue */ + +/* + * Delete record pointed to by cur/level. + */ +int xfs_btree_delrec( + xfs_btree_cur_t *cur, + int level, + int *stat); /* success/failure */ + +/* + * Move 1 record right from cur/level if possible. + * Update cur to reflect the new path. + */ +int xfs_btree_rshift( + xfs_btree_cur_t *cur, + int level, + int *stat); /* success/failure */ + +/* + * Move 1 record left from cur/level if possible. + * Update cur to reflect the new path. + */ +int xfs_btree_lshift( + xfs_btree_cur_t *cur, + int level, + int *stat); /* success/failure */ + +/* + * Split cur/level block in half. + * Return new block number and the key to its + * first record (to be inserted into parent). + */ +int /* error */ +xfs_btree_split( + xfs_btree_cur_t *cur, + int level, + xfs_btree_ptr_t *ptrp, + xfs_btree_key_t *key, + xfs_btree_cur_t **curp, + int *stat); /* success/failure */ + +/* + * Update keys for the record. + */ +int +xfs_btree_updkey( + xfs_btree_cur_t *cur, + xfs_btree_key_t *keyp, /* on-disk format */ + int level); + +/* + * Decrement cursor by one record at the level. + * For nonzero levels the leaf-ward information is untouched. + */ +int /* error */ +xfs_btree_decrement( + xfs_btree_cur_t *cur, + int level, + int *stat); /* success/failure */ + +/* + * Increment cursor by one record at the level. + * For nonzero levels the leaf-ward information is untouched. + */ +int /* error */ +xfs_btree_increment( + xfs_btree_cur_t *cur, + int level, + int *stat); /* success/failure */ + +/* + * Insert the record in cur at the point referenced by cur. + * The cursor may be inconsistent on return if splits have been done. + */ +int +xfs_btree_insert( + xfs_btree_cur_t *cur, + int *stat); + +/* + * Delete the record pointed to by cur. + */ +int /* error */ +xfs_btree_delete( + xfs_btree_cur_t *cur, + int *stat); /* success/failure */ + +/* + * Lookup the record. The cursor is made to point to it, based on dir. + * Return 0 if can't find any such record, 1 for success. + */ +int /* error */ +xfs_btree_lookup( + xfs_btree_cur_t *cur, /* btree cursor */ + xfs_lookup_t dir, /* <=, ==, or >= */ + int *stat); /* success/failure */ + +/* + * Allocate a new root block, fill it in. + */ +int /* error */ +xfs_btree_newroot( + xfs_btree_cur_t *cur, /* btree cursor */ + int *stat); /* success/failure */ + +/* + * Update the record referred to by cur to the value in the + * given record. This either works (return 0) or gets an + * EFSCORRUPTED error. + */ +int +xfs_btree_update( + xfs_btree_cur_t *cur, + xfs_btree_rec_t *rec); + #endif /* __KERNEL__ */ Index: 2.6.x-xfs-new/fs/xfs/xfs_btree_core.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ 2.6.x-xfs-new/fs/xfs/xfs_btree_core.c 2007-11-06 19:40:29.758667866 +1100 @@ -0,0 +1,2299 @@ +/* + * Copyright (c) 2007 Silicon Graphics, Inc. + * All Rights Reserved. + * + * Derived from existing XFS btree code by Dave Chinner. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it would be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write the Free Software Foundation, + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_types.h" +#include "xfs_bit.h" +#include "xfs_log.h" +#include "xfs_inum.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_ag.h" +#include "xfs_dir2.h" +#include "xfs_dmapi.h" +#include "xfs_mount.h" +#include "xfs_bmap_btree.h" +#include "xfs_alloc_btree.h" +#include "xfs_ialloc_btree.h" +#include "xfs_dir2_sf.h" +#include "xfs_attr_sf.h" +#include "xfs_dinode.h" +#include "xfs_inode.h" +#include "xfs_btree.h" +#include "xfs_ialloc.h" +#include "xfs_error.h" + +/* + * ToDo: + * + * - trace infrastructure + * - fix 32bit-ness in xfs_btree_newroot + * - per-btree stats + * - fix check_sblock/sptr as they are alloc btree specific + */ + +/* + * Keys, ptrs and records are supposed to be passed around in host + * format in this code. type specific callouts need to do endian + * swapping as necessary. + */ + +/* + * Btree keys, ptrs and records are passed around in disk format + * and converted where needed by end functions. The values held in + * the cursor for anything is in host order. + */ + +/* + * Internal functions. + */ +STATIC int +xfs_btree_ptr_null( + xfs_btree_cur_t *cur, + xfs_btree_ptr_t *ptr) +{ + switch(cur->bc_btnum) { + case XFS_BTNUM_BNO: + case XFS_BTNUM_CNT: + return be32_to_cpu(ptr->u.alloc) == NULLAGBLOCK; + break; + case XFS_BTNUM_INO: + return be32_to_cpu(ptr->u.inobt) == NULLAGBLOCK; + case XFS_BTNUM_BMAP: + return be64_to_cpu(ptr->u.bmbt) == NULLFSBLOCK; + default: + ASSERT(0); + break; + } + return 0; +} + +STATIC void +xfs_btree_set_ptr_null( + xfs_btree_cur_t *cur, + xfs_btree_ptr_t *ptr) +{ + switch(cur->bc_btnum) { + case XFS_BTNUM_BNO: + case XFS_BTNUM_CNT: + ptr->u.alloc = cpu_to_be32(NULLAGBLOCK); + break; + case XFS_BTNUM_INO: + ptr->u.inobt = cpu_to_be32(NULLAGBLOCK); + break; + case XFS_BTNUM_BMAP: + ptr->u.bmbt = cpu_to_be64(NULLFSBLOCK); + break; + default: + ASSERT(0); + break; + } +} + +STATIC int +xfs_btree_dec_cursor( + xfs_btree_cur_t *cur, + int level, + int *stat) +{ + int i; + int error; + + if (level > 0) { + error = xfs_btree_decrement(cur, level, &i); + if (error) + return error; + } + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + *stat = 1; + return 0; +} + +/* + * Return true if ptr is the last record in the btree and + * we need to track updateÑ• to this record. + */ +STATIC int +xfs_btree_is_lastrec( + xfs_btree_cur_t *cur, + xfs_btree_block_t *block, + int level, + int last) +{ + xfs_btree_blkops_t *bops = cur->bc_blkops; + xfs_btree_recops_t *rops = cur->bc_recops; + xfs_btree_ptr_t ptr; + int numrecs; + + numrecs = rops->get_numrecs(cur, block); + bops->get_sibling(cur, block, &ptr, XFS_BB_RIGHTSIB); + return ((cur->bc_flags & XFS_BTREE_LASTREC_UPDATE) && + level == 0 && + xfs_btree_ptr_null(cur, &ptr) && + last >= numrecs); + +} + +/* + * Move numrecs from the src block to the dst block. + * Log the changes to the destination block. + */ +STATIC int +xfs_btree_move_entries( + xfs_btree_cur_t *cur, + int level, + xfs_buf_t *sbp, /* source block */ + xfs_buf_t *dbp, /* destination block */ + int src_index, /* src block index */ + int dst_index, /* dst block index */ + int numrecs) /* number of records to move */ +{ + xfs_btree_blkops_t *bops = cur->bc_blkops; + xfs_btree_recops_t *rops = cur->bc_recops; + xfs_btree_block_t *src; /* src btree block */ + xfs_btree_key_t *skp; /* src btree key */ + xfs_btree_ptr_t *spp; /* src address pointer */ + xfs_btree_rec_t *srp; /* src record pointer */ + xfs_btree_block_t *dst; /* dst btree block */ + xfs_btree_key_t *dkp; /* dst btree key */ + xfs_btree_ptr_t *dpp; /* dst address pointer */ + xfs_btree_rec_t *drp; /* dst record pointer */ + + src = bops->buf_to_block(cur, sbp); + dst = bops->buf_to_block(cur, dbp); + if (level > 0) { + /* + * It's a non-leaf. Move keys and pointers. + */ + skp = rops->key_addr(cur, src_index, src); + spp = rops->ptr_addr(cur, src_index, src); + dkp = rops->key_addr(cur, dst_index, dst); + dpp = rops->ptr_addr(cur, dst_index, dst); +#ifdef DEBUG + for (i = ptr; i < numrecs; i++) { + error = bops->check_lptr_disk(cur, rpp, i, level); + if (error) + goto error0; + } +#endif + rops->move_keys(cur, skp, dkp, 0, 0, numrecs); + rops->move_ptrs(cur, spp, dpp, 0, 0, numrecs); + + rops->log_keys(cur, dbp, dst_index, numrecs); + rops->log_ptrs(cur, dbp, dst_index, numrecs); + } else { + /* + * It's a leaf. Move records. + */ + srp = rops->rec_addr(cur, src_index, src); + drp = rops->rec_addr(cur, dst_index, dst); + rops->move_recs(cur, srp, drp, 0, 0, numrecs); + rops->log_recs(cur, dbp, dst_index, numrecs); + } + rops->set_numrecs(cur, dst, rops->get_numrecs(cur, dst) + numrecs); + bops->log_block(cur, dbp, XFS_BB_NUMRECS); +#ifdef DEBUG + if (level > 0) + xfs_btree_check_key(cur->bc_btnum, lkp - 1, lkp); + else + xfs_btree_check_rec(cur->bc_btnum, lrp - 1, lrp); +#endif + return 0; +} +/* + * Excise the entries indicated by the start, end. + * Simply slide the entries past them down. + * Log the changed areas of the block. + */ +STATIC int +xfs_btree_remove_entry( + xfs_btree_cur_t *cur, + int level, + xfs_buf_t *bp, + xfs_btree_key_t *key, + int index) /* index to excise */ +{ + xfs_btree_blkops_t *bops = cur->bc_blkops; + xfs_btree_recops_t *rops = cur->bc_recops; + xfs_btree_block_t *block; /* bmap btree block */ + xfs_btree_key_t *kp=NULL; /* pointer to bmap btree key */ + xfs_btree_ptr_t *pp; /* pointer to bmap block addr */ + xfs_btree_rec_t *rp; /* pointer to bmap btree rec */ + int numrecs; + + block = bops->buf_to_block(cur, bp); + numrecs = rops->get_numrecs(cur, block); + if (level > 0) { + /* + * It's a nonleaf. Excise the key and ptr being deleted, by + * sliding the entries past them down one. Log the changed + * areas of the block. + */ + kp = rops->key_addr(cur, 1, block); + pp = rops->ptr_addr(cur, 1, block); +#ifdef DEBUG + for (i = index; i < numrecs; i++) { + error = cur->b_ops->check_lptr_disk(cur, pp, i, level); + if (error) + goto error0; + } +#endif + if (index < numrecs) { + rops->move_keys(cur, kp, NULL, index, index - 1, numrecs - index); + rops->move_ptrs(cur, pp, NULL, index, index - 1, numrecs - index); + rops->log_ptrs(cur, bp, index, numrecs - 1); + rops->log_keys(cur, bp, index, numrecs - 1); + } + } else { + /* + * It's a leaf. Excise the record being deleted, by sliding + * the entries past it down one. Log the changed areas of the + * block. + */ + rp = rops->rec_addr(cur, 1, block); + if (index < numrecs) { + rops->move_recs(cur, rp, NULL, index, index - 1, numrecs - index); + rops->log_recs(cur, bp, index, numrecs - 1); + } + /* + * If it's the first record in the block, we'll need a key + * structure to pass up to the next level (updkey). + */ + if (index == 1) + rops->init_key_from_rec(cur, key, rp); + } + numrecs--; + rops->set_numrecs(cur, block, numrecs); + bops->log_block(cur, bp, XFS_BB_NUMRECS); + return 0; +} + +/* + * Insert the entry indicated by the start index + * Simply slide the entries up one, inser the new entry and + * Log the changed areas of the block. + */ +STATIC int +xfs_btree_insert_entry( + xfs_btree_cur_t *cur, + int level, + xfs_buf_t *bp, + int index, /* index to insert at */ + xfs_btree_key_t *key, + xfs_btree_ptr_t *ptr, + xfs_btree_rec_t *rec) +{ + xfs_btree_blkops_t *bops = cur->bc_blkops; + xfs_btree_recops_t *rops = cur->bc_recops; + xfs_btree_block_t *block; + xfs_btree_key_t *kp; + xfs_btree_ptr_t *pp; + xfs_btree_rec_t *rp; + int numrecs; + + block = bops->buf_to_block(cur, bp); + numrecs = rops->get_numrecs(cur, block); + if (level > 0) { + /* + * It's a non-leaf entry. Make a hole for the new data + * in the key and ptr regions of the block. + */ + kp = rops->key_addr(cur, 1, block); + pp = rops->ptr_addr(cur, 1, block); +#ifdef DEBUG + for (i = numrecs; i >= index; i--) { + error = bops->check_lptr_disk(cur, pp, i - 1, level); + if (error) + goto error0; + } +#endif + rops->move_keys(cur, kp, NULL, index - 1, index, + numrecs - index + 1); + rops->move_ptrs(cur, pp, NULL, index - 1, index, + numrecs - index + 1); + /* + * Now stuff the new data in, bump numrecs and log the new data. + */ +#ifdef DEBUG + error = bops->check_lptr_disk(cur, ptr, 0, level); + if (error) + goto error0; +#endif + rops->set_key(cur, kp, index - 1, key); + rops->set_ptr(cur, pp, index - 1, ptr); + numrecs++; + rops->set_numrecs(cur, block, numrecs); + rops->log_ptrs(cur, bp, index, numrecs); + rops->log_keys(cur, bp, index, numrecs); + } else { + /* + * It's a leaf entry. Make a hole for the new record. + */ + rp = rops->rec_addr(cur, 1, block); + rops->move_recs(cur, rp, NULL, index - 1, index, + numrecs - index + 1); + /* + * Now stuff the new record in, bump numrecs + * and log the new data. + */ + rops->set_rec(cur, rp, index - 1, rec); + numrecs++; + rops->set_numrecs(cur, block, numrecs); + rops->log_recs(cur, bp, index, numrecs); + } + /* + * Log the new number of records in the btree header. + */ + bops->log_block(cur, bp, XFS_BB_NUMRECS); + +#ifdef DEBUG + /* + * Check that the key/record is in the right place, now. + */ + if (ptr < numrecs) { + if (level == 0) + xfs_btree_check_rec(cur->bc_btnum, rp + index - 1, + rp + index); + else + xfs_btree_check_key(cur->bc_btnum, kp + index - 1, + kp + index); + } +#endif + return 0; +} + +/* + * Single level of the btree record deletion routine. + * Delete record pointed to by cur/level. + * Remove the record from its block then rebalance the tree. + * Return 0 for error, 1 for done, 2 to go on to the next level. + */ +int /* error */ +xfs_btree_delrec( + xfs_btree_cur_t *cur, /* btree cursor */ + int level, /* level removing record from */ + int *stat) /* fail/done/go-on */ +{ + xfs_btree_block_t *block; /* bmap btree block */ + xfs_btree_ptr_t cptr; /* current block ptr */ + xfs_buf_t *bp; /* buffer for block */ + int error; /* error return value */ + int i; /* loop counter */ + xfs_btree_key_t key; /* bmap btree key */ + xfs_btree_key_t *kp=NULL; /* pointer to bmap btree key */ + xfs_btree_ptr_t lptr; /* left sibling block ptr */ + xfs_buf_t *lbp; /* left buffer pointer */ + xfs_btree_block_t *left; /* left btree block */ + int lrecs=0; /* left record count */ + int ptr; /* key/record index */ + xfs_btree_ptr_t rptr; /* right sibling block ptr */ + xfs_buf_t *rbp; /* right buffer pointer */ + xfs_btree_block_t *right; /* right btree block */ + xfs_btree_block_t *rrblock; /* right-right btree block */ + xfs_buf_t *rrbp; /* right-right buffer pointer */ + int rrecs=0; /* right record count */ + xfs_btree_cur_t *tcur; /* temporary btree cursor */ + int numrecs; /* temporary numrec count */ + xfs_btree_curops_t *cops = cur->bc_curops; + xfs_btree_blkops_t *bops = cur->bc_blkops; + xfs_btree_recops_t *rops = cur->bc_recops; + + XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY); + XFS_BTREE_TRACE_ARGI(cur, level); + tcur = NULL; + + /* + * Get the index of the entry being deleted, check for nothing there. + */ + ptr = cur->bc_ptrs[level]; + if (ptr == 0) { + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + *stat = 0; + return 0; + } + + /* + * Get the buffer & block containing the record or key/ptr. + */ + block = bops->get_block(cur, level, &bp); + numrecs = rops->get_numrecs(cur, block); +#ifdef DEBUG + error = bops->check_block(cur, block, level, bp); + if (error) + goto error0; +#endif + /* + * Fail if we're off the end of the block. + */ + if (ptr > numrecs) { + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + *stat = 0; + return 0; + } + + XFS_STATS_INC(xs_bmbt_delrec); + + /* + * Excise the entries being deleted. + * Log the changed areas of the block. + */ + error = xfs_btree_remove_entry(cur, level, bp, &key, ptr); + if (error) + goto error0; + + /* + * If we are tracking the last record in the tree and + * we are at the far right edge of the tree, update it. + */ + numrecs = rops->get_numrecs(cur, block); + if (xfs_btree_is_lastrec(cur, block, level, ptr)) { + ASSERT(ptr == numrecs + 1); + error = cops->update_lastrec(cur, block); + if (error) + goto error0; + } + + /* + * We're at the root level. + * First, shrink the root block in-memory. + * Try to get rid of the next level down. + * If we can't then there's nothing left to do. + */ + if (level == cur->bc_nlevels - 1) { + /* root in inode is special */ + if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) { + cops->realloc_root(cur, -1); + error = cops->kill_root(cur, -1, NULL); + if (!error) + error = xfs_btree_dec_cursor(cur, level, stat); + if (error) + goto error0; + } + /* + * If this is the root level, and there's only one entry left, + * and it's NOT the leaf level, then we can get rid of this + * level. + */ + else if (numrecs == 1 && level > 0) { + xfs_btree_ptr_t *pp; + /* + * pp is still set to the first pointer in the block. + * Make it the new root of the btree. + */ + pp = rops->ptr_addr(cur, 1, block); + error = cops->kill_root(cur, level, pp); + if (error) + goto error0; + } else if (level > 0) { + error = xfs_btree_dec_cursor(cur, level, stat); + if (error) + goto error0; + } + *stat = 1; + return 0; + } + + /* + * If we deleted the leftmost entry in the block, update the + * key values above us in the tree. + */ + if (ptr == 1) { + error = xfs_btree_updkey(cur, kp, level + 1); + if (error) + goto error0; + } + + /* + * If the number of records remaining in the block is at least + * the minimum, we're done. + */ + if (numrecs >= rops->get_minrecs(cur, level)) { + error = xfs_btree_dec_cursor(cur, level, stat); + if (error) + goto error0; + return 0; + } + + /* + * Otherwise, we have to move some records around to keep the + * tree balanced. Look at the left and right sibling blocks to + * see if we can re-balance by moving only one record. + */ + bops->get_sibling(cur, block, &rptr, XFS_BB_RIGHTSIB); + bops->get_sibling(cur, block, &lptr, XFS_BB_LEFTSIB); + if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) { + /* + * One child of root, need to get a chance to copy its contents + * into the root and delete it. Can't go up to next level, + * there's nothing to delete there. + */ + if (xfs_btree_ptr_null(cur, &rptr) && + xfs_btree_ptr_null(cur, &lptr) && + level == cur->bc_nlevels - 2) { + error = cops->kill_root(cur, -1, NULL); + if (!error) + error = xfs_btree_dec_cursor(cur, level, stat); + if (error) + goto error0; + return 0; + } + } + ASSERT(!xfs_btree_ptr_null(cur, &rptr) || + !xfs_btree_ptr_null(cur, &lptr)); + + /* + * Duplicate the cursor so our btree manipulations here won't + * disrupt the next level up. + */ + error = xfs_btree_dup_cursor(cur, &tcur); + if (error) + goto error0; + + /* + * If there's a right sibling, see if it's ok to shift an entry + * out of it. + */ + if (!xfs_btree_ptr_null(cur, &rptr)) { + /* + * Move the temp cursor to the last entry in the next block. + * Actually any entry but the first would suffice. + */ + i = xfs_btree_lastrec(tcur, level); + XFS_WANT_CORRUPTED_GOTO(i == 1, error0); + + error = xfs_btree_increment(tcur, level, &i); + if (error) + goto error0; + XFS_WANT_CORRUPTED_GOTO(i == 1, error0); + + i = xfs_btree_lastrec(tcur, level); + XFS_WANT_CORRUPTED_GOTO(i == 1, error0); + + /* + * Grab a pointer to the block. + */ + rbp = tcur->bc_bufs[level]; + right = bops->buf_to_block(tcur, rbp); +#ifdef DEBUG + error = bops->check_block(tcur, right, level, rbp); + if (error) + goto error0; +#endif + /* + * Grab the current block number, for future use. + */ + bops->get_sibling(tcur, right, &cptr, XFS_BB_LEFTSIB); + /* + * If right block is full enough so that removing one entry + * won't make it too empty, and left-shifting an entry out + * of right to us works, we're done. + */ + if (rops->get_numrecs(tcur, right) - 1 >= + rops->get_minrecs(tcur, level)) { + error = xfs_btree_lshift(tcur, level, &i); + if (error) + goto error0; + if (i) { + ASSERT(rops->get_numrecs(tcur, block) >= + rops->get_minrecs(tcur, level)); + xfs_btree_del_cursor(tcur, XFS_BTREE_NOERROR); + tcur = NULL; + error = xfs_btree_dec_cursor(cur, level, stat); + if (error) + goto error0; + return 0; + } + } + /* + * Otherwise, grab the number of records in right for + * future reference, and fix up the temp cursor to point + * to our block again (last record). + */ + rrecs = rops->get_numrecs(tcur, right); + if (!xfs_btree_ptr_null(cur, &lptr)) { + i = xfs_btree_firstrec(tcur, level); + XFS_WANT_CORRUPTED_GOTO(i == 1, error0); + + error = xfs_btree_decrement(tcur, level, &i); + if (error) + goto error0; + XFS_WANT_CORRUPTED_GOTO(i == 1, error0); + } + } + /* + * If there's a left sibling, see if it's ok to shift an entry + * out of it. + */ + if (!xfs_btree_ptr_null(cur, &lptr)) { + /* + * Move the temp cursor to the first entry in the + * previous block. + */ + i = xfs_btree_firstrec(tcur, level); + XFS_WANT_CORRUPTED_GOTO(i == 1, error0); + + error = xfs_btree_decrement(tcur, level, &i); + if (error) + goto error0; + i = xfs_btree_firstrec(tcur, level); + XFS_WANT_CORRUPTED_GOTO(i == 1, error0); + + /* + * Grab a pointer to the block. + */ + lbp = tcur->bc_bufs[level]; + left = bops->buf_to_block(cur, lbp); +#ifdef DEBUG + error = bops->check_block(cur, left, level, lbp); + if (error) + goto error0; +#endif + /* + * Grab the current block number, for future use. + */ + bops->get_sibling(tcur, left, &cptr, XFS_BB_RIGHTSIB); + /* + * If left block is full enough so that removing one entry + * won't make it too empty, and right-shifting an entry out + * of left to us works, we're done. + */ + if (rops->get_numrecs(tcur, left) - 1 >= + rops->get_minrecs(tcur, level)) { + error = xfs_btree_rshift(tcur, level, &i); + if (error) + goto error0; + if (i) { + ASSERT(rops->get_numrecs(tcur, block) >= + rops->get_minrecs(tcur, level)); + xfs_btree_del_cursor(tcur, XFS_BTREE_NOERROR); + tcur = NULL; + if (level == 0) + cur->bc_ptrs[0]++; + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + *stat = 1; + return 0; + } + } + /* + * Otherwise, grab the number of records in right for + * future reference. + */ + lrecs = rops->get_numrecs(tcur, left); + } + /* + * Delete the temp cursor, we're done with it. + */ + xfs_btree_del_cursor(tcur, XFS_BTREE_NOERROR); + tcur = NULL; + + /* + * If here, we need to do a join to keep the tree balanced. + */ + ASSERT(!xfs_btree_ptr_null(cur, &cptr)); + if (!xfs_btree_ptr_null(cur, &lptr) && + ((lrecs + rops->get_numrecs(cur, block)) <= + (rops->get_maxrecs(cur, level)))) { + /* + * Set "right" to be the starting block, + * "left" to be the left neighbor. + */ + rptr = cptr; + right = block; + rbp = bp; + error = bops->read_buf(cur, &lptr, 0, &lbp); + if (error) + goto error0; + left = bops->buf_to_block(cur, lbp); + error = bops->check_block(cur, left, level, lbp); + if (error) + goto error0; + } + /* + * If that won't work, see if we can join with the right neighbor block. + */ + else if (!xfs_btree_ptr_null(cur, &rptr) && + ((rrecs + rops->get_numrecs(cur, block)) <= + (rops->get_maxrecs(cur, level)))) { + /* + * Set "left" to be the starting block, + * "right" to be the right neighbor. + */ + lptr = cptr; + left = block; + lbp = bp; + error = bops->read_buf(cur, &rptr, 0, &rbp); + if (error) + goto error0; + right = bops->buf_to_block(cur, rbp); + error = bops->check_block(cur, right, level, rbp); + if (error) + goto error0; + lrecs = rops->get_numrecs(cur, left); + } + /* + * Otherwise, we can't fix the imbalance. + * Just return. This is probably a logic error, but it's not fatal. + */ + else { + error = xfs_btree_dec_cursor(cur, level, stat); + if (error) + goto error0; + return 0; + } + /* + * We're now going to join "left" and "right" by moving all the stuff + * in "right" to "left" and deleting "right". + */ + error = xfs_btree_move_entries(cur, level, rbp, lbp, 1, lrecs + 1, rrecs); + if (error) + goto error0; + + /* + * Fix up the right block pointer in the surviving block, and log it. + */ + bops->get_sibling(cur, right, &cptr, XFS_BB_RIGHTSIB), + bops->set_sibling(cur, left, &cptr, XFS_BB_RIGHTSIB); + bops->log_block(cur, lbp, XFS_BB_NUMRECS | XFS_BB_RIGHTSIB); + + /* + * If there is a right sibling now, make it point to the + * remaining block. + */ + bops->get_sibling(cur, left, &cptr, XFS_BB_RIGHTSIB); + if (!xfs_btree_ptr_null(cur, &cptr)) { + error = bops->read_buf(cur, &cptr, 0, &rrbp); + if (error) + goto error0; + rrblock = bops->buf_to_block(cur, rrbp); + error = bops->check_block(cur, rrblock, level, rrbp); + if (error) + goto error0; + bops->set_sibling(cur, rrblock, &lptr, XFS_BB_LEFTSIB); + bops->log_block(cur, rrbp, XFS_BB_LEFTSIB); + } + /* + * Free the deleted block. + */ + error = bops->free_block(cur, rbp, 1); + if (error) + goto error0; + + /* + * If we joined with the left neighbor, set the buffer in the + * cursor to the left block, and fix up the index. + */ + if (bp != lbp) { + cur->bc_bufs[level] = lbp; + cur->bc_ptrs[level] += lrecs; + cur->bc_ra[level] = 0; + } + /* + * If we joined with the right neighbor and there's a level above + * us, increment the cursor at that level. + */ + else if ((cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) || + (level + 1 < cur->bc_nlevels)) { + error = xfs_btree_increment(cur, level + 1, &i); + if (error) + goto error0; + } + + /* + * Readjust the ptr at this level if it's not a leaf, since it's + * still pointing at the deletion point, which makes the cursor + * inconsistent. If this makes the ptr 0, the caller fixes it up. + * We can't use decrement because it would change the next level up. + */ + if (level > 0) + cur->bc_ptrs[level]--; + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + /* + * Return value means the next level up has something to do. + */ + *stat = 2; + return 0; + +error0: + XFS_BTREE_TRACE_CURSOR(cur, XBT_ERROR); + if (tcur) + xfs_btree_del_cursor(tcur, XFS_BTREE_ERROR); + return error; +} + +STATIC int +xfs_btree_make_block_unfull( + xfs_btree_cur_t *cur, /* btree cursor */ + int level, /* btree level */ + int numrecs, /* # of recs in block */ + int *oindex, /* old tree index */ + int *index, /* new tree index */ + xfs_btree_ptr_t *nptr, /* new btree ptr */ + xfs_btree_cur_t **ncur, /* new btree cursor */ + xfs_btree_rec_t *nrec, /* new record */ + int *stat) +{ + xfs_btree_curops_t *cops = cur->bc_curops; + xfs_btree_recops_t *rops = cur->bc_recops; + xfs_btree_key_t key; /* new btree key value */ + int error = 0; + + if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) { + if (numrecs < rops->get_dmaxrecs(cur, level)) { + /* A resizeable root block that can be made bigger. */ + cops->realloc_root(cur, 1); + return 0; + } + if (level == cur->bc_nlevels - 1) { + /* A root block that needs replacing */ + error = cops->new_root(cur, stat); + if (error || *stat == 0) + return error; + return 0; + } + } + + /* + * First, try shifting an entry to the right neighbor. + */ + error = xfs_btree_rshift(cur, level, stat); + if (error) + return error; + if (*stat) { + /* nothing */ + } else { + /* + * Next, try shifting an entry to the left neighbor. + */ + error = xfs_btree_lshift(cur, level, stat); + if (error) + return error; + if (*stat) { + *oindex = *index = cur->bc_ptrs[level]; + } else { + /* + * Next, try splitting the current block in half. If + * this works we have to re-set our variables because + * we could be in a different block now. + */ + error = xfs_btree_split(cur, level, nptr, &key, + ncur, stat); + if (error || *stat == 0) + return error; + + *index = cur->bc_ptrs[level]; + rops->init_rec_from_key(cur, &key, nrec); + } + } + return 0; +} + +/* + * Insert one record/level. Return information to the caller + * allowing the next level up to proceed if necessary. + */ +int +xfs_btree_insrec( + xfs_btree_cur_t *cur, /* btree cursor */ + int level, /* level to insert record at */ + xfs_btree_ptr_t *ptrp, /* i/o: block number inserted */ + xfs_btree_rec_t *recp, /* i/o: record data inserted */ + xfs_btree_cur_t **curp, /* output: new cursor replacing cur */ + int *stat) /* success/failure */ +{ + xfs_btree_block_t *block; /* bmap btree block */ + xfs_buf_t *bp; /* buffer for block */ + int error; /* error return value */ + int i; /* loop index */ + xfs_btree_key_t key; /* bmap btree key */ + xfs_btree_ptr_t nptr; /* new block ptr */ + struct xfs_btree_cur *ncur; /* new btree cursor */ + xfs_btree_rec_t nrec; /* new record count */ + int optr; /* old key/record index */ + int ptr; /* key/record index */ + int numrecs; + xfs_btree_curops_t *cops = cur->bc_curops; + xfs_btree_blkops_t *bops = cur->bc_blkops; + xfs_btree_recops_t *rops = cur->bc_recops; + + ASSERT(level < cur->bc_nlevels); + XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY); + XFS_BTREE_TRACE_ARGIPR(cur, level, ptrp, recp); + ncur = NULL; + /* + * If we have an external root pointer, and we've made it to the + * root level, allocate a new root block and we're done. + */ + if (!(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) && + (level >= cur->bc_nlevels)) { + error = cops->new_root(cur, &i); + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + xfs_btree_set_ptr_null(cur, ptrp); + *stat = i; + return error; + } + /* + * Make a key out of the record data to be inserted, and save it. + */ + rops->init_key_from_rec(cur, &key, recp); + /* + * If we're off the left edge, return failure. + */ + optr = ptr = cur->bc_ptrs[level]; + if (ptr == 0) { + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + *stat = 0; + return 0; + } + XFS_STATS_INC(xs_bmbt_insrec); + + /* + * Get pointers to the btree buffer and block. + */ + block = bops->get_block(cur, level, &bp); + numrecs = rops->get_numrecs(cur, block); +#ifdef DEBUG + error = bops->check_block(cur, block, level, bp); + if (error) + goto error0; + /* + * Check that the new entry is being inserted in the right place. + */ + if (ptr <= numrecs) { + if (level == 0) { + rp = rops->rec_addr(cur, ptr, block); + xfs_btree_check_rec(cur->bc_btnum, recp, rp); + } else { + kp = rops->key_addr(cur, ptr, block); + xfs_btree_check_key(cur->bc_btnum, &key, kp); + } + } +#endif + /* + * If the block is full, we can't insert the new entry until we + * make the block un-full. + */ + xfs_btree_set_ptr_null(cur, &nptr); + ncur = NULL; + if (numrecs == rops->get_maxrecs(cur, level)) { + error = xfs_btree_make_block_unfull(cur, level, numrecs, + &optr, &ptr, &nptr, &ncur, &nrec, stat); + if (error || *stat == 0) + goto error0; + } + /* + * The current block may have changed during the split. + */ + block = bops->get_block(cur, level, &bp); +#ifdef DEBUG + error = bops->check_block(cur, block, level, bp); + if (error) + return error; +#endif + + /* + * At this point we know there's room for our new entry in the block + * we're pointing at. + */ + error = xfs_btree_insert_entry(cur, level, bp, ptr, &key, ptrp, recp); + if (error) + goto error0; + + /* + * If we inserted at the start of a block, update the parents' keys. + */ + if (optr == 1) { + error = xfs_btree_updkey(cur, &key, level + 1); + if (error) + goto error0; + } + + /* + * Return the new block number, if any. + * If there is one, give back a record value and a cursor too. + */ + *ptrp = nptr; + if (!xfs_btree_ptr_null(cur, &nptr)) { + *recp = nrec; + *curp = ncur; + } + + /* + * If we are tracking the last record in the tree and + * we are at the far right edge of the tree, update it. + */ + if (xfs_btree_is_lastrec(cur, block, level, ptr)) { + error = cops->update_lastrec(cur, block); + if (error) + goto error0; + } + + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + *stat = 1; + return 0; + +error0: + XFS_BTREE_TRACE_CURSOR(cur, XBT_ERROR); + return error; +} + +/* + * Move 1 record left from cur/level if possible. + * Update cur to reflect the new path. + */ +int /* error */ +xfs_btree_lshift( + xfs_btree_cur_t *cur, + int level, + int *stat) /* success/failure */ +{ + int error; /* error return value */ +#ifdef DEBUG + int i; /* loop counter */ +#endif + xfs_btree_key_t key; /* btree key */ + xfs_buf_t *lbp; /* left buffer pointer */ + xfs_btree_block_t *left; /* left btree block */ + int lrecs; /* left record count */ + xfs_buf_t *rbp; /* right buffer pointer */ + xfs_btree_block_t *right; /* right btree block */ + xfs_btree_blkops_t *bops = cur->bc_blkops; + xfs_btree_recops_t *rops = cur->bc_recops; + xfs_btree_ptr_t rptr; + + XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY); + XFS_BTREE_TRACE_ARGI(cur, level); + if ((cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) && + level == cur->bc_nlevels - 1) + goto out0; + /* + * Set up variables for this block as "right". + */ + rbp = cur->bc_bufs[level]; + right = bops->buf_to_block(cur, rbp); +#ifdef DEBUG + error = bops->check_block(cur, right, level, rbp); + if (error) + goto error0; +#endif + /* + * If we've got no left sibling then we can't shift an entry left. + */ + bops->get_sibling(cur, right, &rptr, XFS_BB_LEFTSIB); + if (xfs_btree_ptr_null(cur, &rptr)) + goto out0; + /* + * If the cursor entry is the one that would be moved, don't + * do it... it's too complicated. + */ + if (cur->bc_ptrs[level] <= 1) + goto out0; + + /* + * Set up the left neighbor as "left". + */ + error = bops->read_buf(cur, &rptr, 0, &lbp); + if (error) + goto error0; + left = bops->buf_to_block(cur, lbp); + error = bops->check_block(cur, left, level, lbp); + if (error) + goto error0; + + /* + * If it's full, it can't take another entry. + */ + lrecs = rops->get_numrecs(cur, left); + if (lrecs == rops->get_maxrecs(cur, level)) + goto out0; + /* + * If non-leaf, copy a key and a ptr to the left block. + * Log the changes to the left block. + */ + error = xfs_btree_move_entries(cur, level, rbp, lbp, 1, lrecs + 1, 1); + if (error) + goto error0; + + /* + * Slide the contents of right down one entry. + * Log the changes to the right block. + */ + error = xfs_btree_remove_entry(cur, level, rbp, &key, 1); + if (error) + goto error0; + + /* + * Update the parent key values of right. + */ + error = xfs_btree_updkey(cur, &key, level + 1); + if (error) + goto error0; + /* + * Slide the cursor value left one. + */ + cur->bc_ptrs[level]--; + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + *stat = 1; + return 0; + +out0: + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + *stat = 0; + return 0; + +error0: + XFS_BTREE_TRACE_CURSOR(cur, XBT_ERROR); + return error; +} + +/* + * Move 1 record right from cur/level if possible. + * Update cur to reflect the new path. + */ +int /* error */ +xfs_btree_rshift( + xfs_btree_cur_t *cur, + int level, + int *stat) /* success/failure */ +{ + int error; /* error return value */ + int i; /* loop counter */ + xfs_btree_key_t key; /* btree key */ + xfs_buf_t *lbp; /* left buffer pointer */ + xfs_btree_block_t *left; /* left btree block */ + xfs_buf_t *rbp; /* right buffer pointer */ + xfs_btree_block_t *right; /* right btree block */ + struct xfs_btree_cur *tcur; /* temporary btree cursor */ + xfs_btree_blkops_t *bops = cur->bc_blkops; + xfs_btree_recops_t *rops = cur->bc_recops; + xfs_btree_ptr_t rptr; + int rrecs; /* right record count */ + int lrecs; /* left record count */ + + XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY); + XFS_BTREE_TRACE_ARGI(cur, level); + if ((cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) && + (level == cur->bc_nlevels - 1)) + goto out0; + /* + * Set up variables for this block as "left". + */ + lbp = cur->bc_bufs[level]; + left = bops->buf_to_block(cur, lbp); +#ifdef DEBUG + error = bops->check_block(cur, left, level, lbp); + if (error) + goto error0; +#endif + /* + * If we've got no right sibling then we can't shift an entry right. + */ + bops->get_sibling(cur, left, &rptr, XFS_BB_RIGHTSIB); + if (xfs_btree_ptr_null(cur, &rptr)) + goto out0; + /* + * If the cursor entry is the one that would be moved, don't + * do it... it's too complicated. + */ + lrecs = rops->get_numrecs(cur, left); + if (cur->bc_ptrs[level] >= lrecs) + goto out0; + /* + * Set up the right neighbor as "right". + */ + error = bops->read_buf(cur, &rptr, 0, &rbp); + if (error) + goto error0; + right = bops->buf_to_block(cur, rbp); + error = bops->check_block(cur, right, level, rbp); + if (error) + goto error0; + + /* + * If it's full, it can't take another entry. + */ + rrecs = rops->get_numrecs(cur, right); + if (rrecs == rops->get_maxrecs(cur, level)) + goto out0; + + /* + * Make a hole at the start of the right neighbor block, then + * copy the last left block entry to the hole. Update and + * log the right block. + */ + error = xfs_btree_insert_entry(cur, level, rbp, 1, + rops->key_addr(cur, lrecs, left), + rops->ptr_addr(cur, lrecs, left), + rops->rec_addr(cur, lrecs, left)); + if (error) + goto error0; + + /* + * If we are at leaf level, grab the key of the new entry in + * the right block for later. + */ + if (level == 0) + rops->init_key_from_rec(cur, &key, rops->rec_addr(cur, 1, right)); + + /* + * Now update the left block to reflect the moved entry + */ + lrecs--; + rops->set_numrecs(cur, left, lrecs); + bops->log_block(cur, lbp, XFS_BB_NUMRECS); + + /* + * Using a temporary cursor, update the parent key values of the + * block on the right. + */ + error = xfs_btree_dup_cursor(cur, &tcur); + if (error) + goto error0; + i = xfs_btree_lastrec(tcur, level); + XFS_WANT_CORRUPTED_GOTO(i == 1, error0); + + error = xfs_btree_increment(tcur, level, &i); + if (error) + goto error1; + XFS_WANT_CORRUPTED_GOTO(i == 1, error0); + + error = xfs_btree_updkey(cur, &key, level + 1); + if (error) + goto error1; + + xfs_btree_del_cursor(tcur, XFS_BTREE_NOERROR); + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + *stat = 1; + return 0; + +out0: + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + *stat = 0; + return 0; + +error0: + XFS_BTREE_TRACE_CURSOR(cur, XBT_ERROR); + return error; + +error1: + XFS_BTREE_TRACE_CURSOR(tcur, XBT_ERROR); + xfs_btree_del_cursor(tcur, XFS_BTREE_ERROR); + return error; +} + +/* + * Split cur/level block in half. + * Return new block number and the key to its first + * record (to be inserted into parent). + */ +int /* error */ +xfs_btree_split( + xfs_btree_cur_t *cur, + int level, + xfs_btree_ptr_t *ptrp, + xfs_btree_key_t *key, + xfs_btree_cur_t **curp, + int *stat) /* success/failure */ +{ + int error; /* error return value */ + xfs_btree_ptr_t lptr; /* left sibling block ptr */ + xfs_buf_t *lbp; /* left buffer pointer */ + xfs_btree_block_t *left; /* left btree block */ + xfs_btree_ptr_t rptr; /* right sibling block ptr */ + xfs_buf_t *rbp; /* right buffer pointer */ + xfs_btree_block_t *right; /* right btree block */ + xfs_btree_ptr_t rrptr; /* right-right sibling ptr */ + xfs_buf_t *rrbp; /* right-right buffer pointer */ + xfs_btree_block_t *rrblock; /* right-right btree block */ + xfs_btree_blkops_t *bops = cur->bc_blkops; + xfs_btree_recops_t *rops = cur->bc_recops; + int lrecs; + int rrecs; + + XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY); + XFS_BTREE_TRACE_ARGIPK(cur, level, ptrp, key); + + /* + * Set up left block (current one). + */ + lbp = cur->bc_bufs[level]; + bops->buf_to_ptr(cur, lbp, &lptr); + + /* + * Allocate the new block. + * If we can't do it, we're toast. Give up. + */ + error = bops->alloc_block(cur, &lptr, &rptr, 1, stat); + if (error) + goto error0; + if (*stat == 0) + goto out0; + + /* + * Set up the new block as "right". + */ + error = bops->get_buf(cur, &rptr, 0, &rbp); + if (error) + goto error0; + right = bops->buf_to_block(cur, rbp); + + /* + * "Left" is the current (according to the cursor) block. + */ + left = bops->buf_to_block(cur, lbp); +#ifdef DEBUG + error = bops->check_block(cur, left, level, lbp); + if (error) + goto error0; +#endif + + /* + * Fill in the btree header for the new block. + */ + bops->init_sibling(cur, right, left); + + /* + * Split the entries between the old and the new block evenly. + * Make sure that if there's an odd number of entries now, that + * each new block will have the same number of entries. + */ + lrecs = rops->get_numrecs(cur, left); + rrecs = lrecs / 2; + if ((lrecs & 1) && cur->bc_ptrs[level] <= rrecs + 1) + rrecs++; + + /* + * Copy btree block entries from the left block over to the + * new block, the right. Update the right block and log the + * changes. + */ + error = xfs_btree_move_entries(cur, level, lbp, rbp, + (lrecs - rrecs + 1), 1, rrecs); + if (error) + goto error0; + + /* + * Grab the keys to the entries moved to the right block + */ + if (level > 0) { + xfs_btree_key_t *keyp; + keyp = rops->key_addr(cur, 1, right); + rops->move_keys(cur, keyp, key, 0, 0, 1); + } else { + rops->init_key_from_rec(cur, key, rops->rec_addr(cur, 1, right)); + } + + /* + * Find the left block number by looking in the buffer. + * Adjust numrecs, sibling pointers. + */ + bops->get_sibling(cur, left, &rrptr, XFS_BB_RIGHTSIB); + bops->set_sibling(cur, right, &rrptr, XFS_BB_RIGHTSIB); + bops->set_sibling(cur, right, &lptr, XFS_BB_LEFTSIB); + bops->set_sibling(cur, left, &rptr, XFS_BB_RIGHTSIB); + + lrecs -= rrecs; + rops->set_numrecs(cur, left, lrecs); + + bops->log_block(cur, rbp, XFS_BB_ALL_BITS); + bops->log_block(cur, lbp, XFS_BB_NUMRECS | XFS_BB_RIGHTSIB); + + /* + * If there's a block to the new block's right, make that block + * point back to right instead of to left. + */ + if (!xfs_btree_ptr_null(cur, &rrptr)) { + error = bops->read_buf(cur, &rrptr, 0, &rrbp); + if (error) + goto error0; + rrblock = bops->buf_to_block(cur, rrbp); + error = bops->check_block(cur, rrblock, level, rrbp); + if (error) + goto error0; + + bops->set_sibling(cur, rrblock, &rptr, XFS_BB_LEFTSIB); + bops->log_block(cur, rrbp, XFS_BB_LEFTSIB); + } + /* + * If the cursor is really in the right block, move it there. + * If it's just pointing past the last entry in left, then we'll + * insert there, so don't change anything in that case. + */ + if (cur->bc_ptrs[level] > lrecs + 1) { + xfs_btree_setbuf(cur, level, rbp); + cur->bc_ptrs[level] -= lrecs; + } + /* + * If there are more levels, we'll need another cursor which refers + * the right block, no matter where this cursor was. + */ + if (level + 1 < cur->bc_nlevels) { + error = xfs_btree_dup_cursor(cur, curp); + if (error) + goto error0; + (*curp)->bc_ptrs[level + 1]++; + } + *ptrp = rptr; + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + *stat = 1; + return 0; +out0: + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + *stat = 0; + return 0; + +error0: + XFS_BTREE_TRACE_CURSOR(cur, XBT_ERROR); + return error; +} + +/* + * Update keys at all levels from here to the root along the cursor's path. + */ +int +xfs_btree_updkey( + xfs_btree_cur_t *cur, + xfs_btree_key_t *keyp, /* on-disk format */ + int level) +{ + xfs_btree_block_t *block; + xfs_buf_t *bp; +#ifdef DEBUG + int error; +#endif + xfs_btree_key_t *kp; + int ptr; + xfs_btree_blkops_t *bops = cur->bc_blkops; + xfs_btree_recops_t *rops = cur->bc_recops; + + ASSERT(!(cur->bc_flags & XFS_BTREE_INODE_IN_ROOT) || level >= 1); + XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY); + XFS_BTREE_TRACE_ARGIK(cur, level, keyp); + /* + * Go up the tree from this level toward the root. + * At each level, update the key value to the value input. + * Stop when we reach a level where the cursor isn't pointing + * at the first entry in the block. + */ + for (ptr = 1; ptr == 1 && level < cur->bc_nlevels; level++) { + block = bops->get_block(cur, level, &bp); +#ifdef DEBUG + error = bops->check_block(cur, block, level, bp); + if (error) { + XFS_BTREE_TRACE_CURSOR(cur, XBT_ERROR); + return error; + } +#endif + ptr = cur->bc_ptrs[level]; + kp = rops->key_addr(cur, ptr, block); + rops->move_keys(cur, keyp, kp, 0, 0, 1); + rops->log_keys(cur, bp, ptr, ptr); + } + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + return 0; +} + +/* + * Increment cursor by one record at the level. + * For nonzero levels the leaf-ward information is untouched. + */ +int /* error */ +xfs_btree_increment( + xfs_btree_cur_t *cur, + int level, + int *stat) /* success/failure */ +{ + xfs_btree_block_t *block; + xfs_btree_ptr_t ptr; + xfs_buf_t *bp; + int error; /* error return value */ + int lev; + xfs_btree_blkops_t *bops = cur->bc_blkops; + xfs_btree_recops_t *rops = cur->bc_recops; + + XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY); + XFS_BTREE_TRACE_ARGI(cur, level); + ASSERT(level < cur->bc_nlevels); + /* + * Read-ahead to the right at this level. + */ + xfs_btree_readahead(cur, level, XFS_BTCUR_RIGHTRA); + /* + * Get a pointer to the btree block. + */ + block = bops->get_block(cur, level, &bp); +#ifdef DEBUG + error = bops->check_block(cur, block, level, bp); + if (error) + goto error0; +#endif + /* + * Increment the ptr at this level. If we're still in the block + * then we're done. + */ + if (++cur->bc_ptrs[level] <= rops->get_numrecs(cur, block)) + goto out1; + /* + * If we just went off the right edge of the tree, return failure. + */ + bops->get_sibling(cur, block, &ptr, XFS_BB_RIGHTSIB); + if (xfs_btree_ptr_null(cur, &ptr)) + goto out0; + + /* + * March up the tree incrementing pointers. + * Stop when we don't go off the right edge of a block. + */ + for (lev = level + 1; lev < cur->bc_nlevels; lev++) { + block = bops->get_block(cur, lev, &bp); +#ifdef DEBUG + error = bops->check_block(cur, block, lev, bp); + if (error) + goto error0; +#endif + if (++cur->bc_ptrs[lev] <= rops->get_numrecs(cur, block)) + break; + /* + * Read-ahead the right block, we're going to read it + * in the next loop. + */ + xfs_btree_readahead(cur, lev, XFS_BTCUR_RIGHTRA); + } + /* + * If we went off the root then we are either seriously + * confused or have the tree root in an inode. + */ + if (lev == cur->bc_nlevels) { + ASSERT(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE); + goto out0; + } + ASSERT(lev < cur->bc_nlevels); + + /* + * Now walk back down the tree, fixing up the cursor's buffer + * pointers and key numbers. + */ + for (block = bops->get_block(cur, lev, &bp); lev > level; ) { + xfs_btree_ptr_t *ptrp; + + ptrp = rops->ptr_addr(cur, cur->bc_ptrs[lev], block); + error = bops->read_buf(cur, ptrp, 0, &bp); + if (error) + goto error0; + lev--; + xfs_btree_setbuf(cur, lev, bp); + block = bops->buf_to_block(cur, bp); + error = bops->check_block(cur, block, lev, bp); + if (error) + goto error0; + cur->bc_ptrs[lev] = 1; + } +out1: + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + *stat = 1; + return 0; + +out0: + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + *stat = 0; + return 0; + +error0: + XFS_BTREE_TRACE_CURSOR(cur, XBT_ERROR); + return error; +} + +/* + * Decrement cursor by one record at the level. + * For nonzero levels the leaf-ward information is untouched. + */ +int /* error */ +xfs_btree_decrement( + xfs_btree_cur_t *cur, + int level, + int *stat) /* success/failure */ +{ + xfs_btree_block_t *block; + xfs_buf_t *bp; + int error; /* error return value */ + int lev; + xfs_btree_ptr_t ptr; + xfs_btree_blkops_t *bops = cur->bc_blkops; + xfs_btree_recops_t *rops = cur->bc_recops; + + XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY); + XFS_BTREE_TRACE_ARGI(cur, level); + ASSERT(level < cur->bc_nlevels); + /* + * Read-ahead to the left at this level. + */ + xfs_btree_readahead(cur, level, XFS_BTCUR_LEFTRA); + /* + * Decrement the ptr at this level. If we're still in the block + * then we're done. + */ + if (--cur->bc_ptrs[level] > 0) + goto out1; + /* + * Get a pointer to the btree block. + */ + block = bops->get_block(cur, level, &bp); +#ifdef DEBUG + error = bops->check_block(cur, block, level, bp); + if (error) + goto error0; +#endif + /* + * If we just went off the left edge of the tree, return failure. + */ + bops->get_sibling(cur, block, &ptr, XFS_BB_LEFTSIB); + if (xfs_btree_ptr_null(cur, &ptr)) + goto out0; + /* + * March up the tree decrementing pointers. + * Stop when we don't go off the left edge of a block. + */ + for (lev = level + 1; lev < cur->bc_nlevels; lev++) { + if (--cur->bc_ptrs[lev] > 0) + break; + /* + * Read-ahead the left block, we're going to read it + * in the next loop. + */ + xfs_btree_readahead(cur, lev, XFS_BTCUR_LEFTRA); + } + /* + * If we went off the root then we are seriously confused. + * or the root of the tree is in an inode. + */ + if (lev == cur->bc_nlevels) { + ASSERT(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE); + goto out0; + } + ASSERT(lev < cur->bc_nlevels); + /* + * Now walk back down the tree, fixing up the cursor's buffer + * pointers and key numbers. + */ + for (block = bops->get_block(cur, lev, &bp); lev > level; ) { + xfs_btree_ptr_t *ptrp; + + ptrp = rops->ptr_addr(cur, cur->bc_ptrs[lev], block); + error = bops->read_buf(cur, ptrp, 0, &bp); + if (error) + goto error0; + lev--; + xfs_btree_setbuf(cur, lev, bp); + block = bops->buf_to_block(cur, bp); + error = bops->check_block(cur, block, lev, bp); + if (error) + goto error0; + cur->bc_ptrs[lev] = rops->get_numrecs(cur, block); + } +out1: + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + *stat = 1; + return 0; + +out0: + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + *stat = 0; + return 0; + +error0: + XFS_BTREE_TRACE_CURSOR(cur, XBT_ERROR); + return error; +} + +/* + * Insert the record at the point referenced by cur. + * The cursor may be inconsistent on return if splits have been done. + */ +int +xfs_btree_insert( + xfs_btree_cur_t *cur, + int *stat) +{ + int error; /* error return value */ + int i; /* result value, 0 for failure */ + int level; /* current level number in btree */ + xfs_btree_ptr_t nptr; /* new block number (split result) */ + xfs_btree_cur_t *ncur; /* new cursor (split result) */ + xfs_btree_cur_t *pcur; /* previous level's cursor */ + xfs_btree_rec_t rec; /* record to insert */ + xfs_btree_curops_t *cops = cur->bc_curops; + + level = 0; + xfs_btree_set_ptr_null(cur, &nptr); + cur->bc_recops->init_rec_from_cur(cur, &rec); + ncur = NULL; + pcur = cur; + /* + * Loop going up the tree, starting at the leaf level. + * Stop when we don't get a split block, that must mean that + * the insert is finished with this level. + */ + do { + /* + * Insert nrec/nptr into this level of the tree. + * Note if we fail, nptr will be null. + */ + error = xfs_btree_insrec(pcur, level, &nptr, &rec, &ncur, &i); + if (error) { + if (pcur != cur) + xfs_btree_del_cursor(pcur, XFS_BTREE_ERROR); + goto error0; + } + XFS_WANT_CORRUPTED_GOTO(i == 1, error0); + level++; + /* + * See if the cursor we just used is trash. + * Can't trash the caller's cursor, but otherwise we should + * if ncur is a new cursor or we're about to be done. + */ + if (pcur != cur && (ncur || xfs_btree_ptr_null(cur, &nptr))) { + /* + * some btrees need to move state from one cursor + * to another here. + */ + if (cops->update_cursor) + cops->update_cursor(pcur, cur); + cur->bc_nlevels = pcur->bc_nlevels; + xfs_btree_del_cursor(pcur, XFS_BTREE_NOERROR); + } + /* + * If we got a new cursor, switch to it. + */ + if (ncur) { + pcur = ncur; + ncur = NULL; + } + } while (!xfs_btree_ptr_null(cur, &nptr)); + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + *stat = i; + return 0; +error0: + XFS_BTREE_TRACE_CURSOR(cur, XBT_ERROR); + return error; +} + +/* + * Delete the record pointed to by cur. + * The cursor refers to the place where the record was (could be inserted) + * when the operation returns. + */ +int /* error */ +xfs_btree_delete( + xfs_btree_cur_t *cur, + int *stat) /* success/failure */ +{ + int error; /* error return value */ + int i; + int level; + + XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY); + /* + * Go up the tree, starting at leaf level. + * If 2 is returned then a join was done; go to the next level. + * Otherwise we are done. + */ + for (level = 0, i = 2; i == 2; level++) { + error = xfs_btree_delrec(cur, level, &i); + if (error) + goto error0; + } + if (i == 0) { + for (level = 1; level < cur->bc_nlevels; level++) { + if (cur->bc_ptrs[level] == 0) { + error = xfs_btree_decrement(cur, level, &i); + if (error) + goto error0; + break; + } + } + } + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + *stat = i; + return 0; +error0: + XFS_BTREE_TRACE_CURSOR(cur, XBT_ERROR); + return error; +} + +STATIC int +xfs_btree_lookup_get_block( + xfs_btree_cur_t *cur, /* btree cursor */ + xfs_btree_block_t **blkp, /* current btree block */ + int level, /* level in the btree */ + xfs_btree_ptr_t *pp) /* ptr to btree block */ +{ + xfs_buf_t *bp; /* buffer pointer for btree block */ + xfs_daddr_t d; /* disk address of btree block */ + int error = 0; + xfs_btree_block_t *block; /* current btree block */ + xfs_btree_blkops_t *bops = cur->bc_blkops; + xfs_btree_recops_t *rops = cur->bc_recops; + + /* + * special case the root block if in an inode + */ + if ((cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) && + (level >= cur->bc_nlevels - 1)) { + *blkp = bops->get_block(cur, level, &bp); + return 0; + } + + /* + * Get the disk address we're looking for. + */ + d = rops->ptr_to_daddr(cur, pp); + /* + * If the old buffer at this level is for a different block, + * throw it away, otherwise just use it. + */ + bp = cur->bc_bufs[level]; + if (bp && XFS_BUF_ADDR(bp) != d) + bp = NULL; + if (!bp) { + /* + * Need to get a new buffer. Read it, then + * set it in the cursor, releasing the old one. + */ + error = bops->read_buf(cur, pp, 0, &bp); + if (error) + return error; + xfs_btree_setbuf(cur, level, bp); + /* + * Point to the btree block, now that we have the buffer + */ + block = bops->buf_to_block(cur, bp); + error = bops->check_block(cur, block, level, bp); + if (error) + return error; + } else + block = bops->buf_to_block(cur, bp); + + *blkp = block; + return 0; +} + +/* + * Lookup the record. The cursor is made to point to it, based on dir. + * Return 0 if can't find any such record, 1 for success. + */ +int /* error */ +xfs_btree_lookup( + xfs_btree_cur_t *cur, /* btree cursor */ + xfs_lookup_t dir, /* <=, ==, or >= */ + int *stat) /* success/failure */ +{ + xfs_btree_block_t *block = NULL; /* current btree block */ + __int64_t diff; /* difference for the current key */ + int error; /* error return value */ + int keyno = 0; /* current key number */ + int level; /* level in the btree */ + xfs_btree_ptr_t *pp; /* ptr to btree block */ + xfs_btree_ptr_t ptr; /* ptr to btree block */ + xfs_btree_blkops_t *bops = cur->bc_blkops; + xfs_btree_recops_t *rops = cur->bc_recops; + + /* + * initialise start pointer from cursor + */ + rops->init_ptr_from_cur(cur, &ptr); + pp = &ptr; + + /* + * Iterate over each level in the btree, starting at the root. + * For each level above the leaves, find the key we need, based + * on the lookup record, then follow the corresponding block + * pointer down to the next level. + */ + for (level = cur->bc_nlevels - 1, diff = 1; level >= 0; level--) { + /* + * Get the block we need to do the lookup on. + */ + error = xfs_btree_lookup_get_block(cur, &block, level, pp); + if (error) + goto error0; + + /* + * If we already had a key match at a higher level, we know + * we need to use the first entry in this block. + */ + if (diff == 0) + keyno = 1; + /* + * Otherwise we need to search this block. Do a binary search. + */ + else { + int high; /* high entry number */ + int low; /* low entry number */ + + /* + * Set low and high entry numbers, 1-based. + */ + low = 1; + high = rops->get_numrecs(cur, block); + if (!high) { + /* + * If the block is empty, the tree must + * be an empty leaf. + */ + ASSERT(level == 0 && cur->bc_nlevels == 1); + cur->bc_ptrs[0] = dir != XFS_LOOKUP_LE; + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + *stat = 0; + return 0; + } + /* + * Binary search the block. + */ + while (low <= high) { + xfs_btree_key_t key; + xfs_btree_key_t *kp; + + XFS_STATS_INC(xs_bmbt_compare); + /* + * keyno is average of low and high. + */ + keyno = (low + high) >> 1; + /* + * Get current search key + */ + if (level > 0) { + kp = rops->key_addr(cur, keyno, block); + } else { + xfs_btree_rec_t *krp; + + krp = rops->rec_addr(cur, keyno, block); + kp = &key; + rops->init_key_from_rec(cur, kp, krp); + } + /* + * Compute difference to get next direction. + */ + diff = rops->key_diff(cur, kp); + + /* + * Less than, move right. + * Greater than, move left. + * Equal, we're done. + */ + if (diff < 0) + low = keyno + 1; + else if (diff > 0) + high = keyno - 1; + else + break; + } + } + /* + * If there are more levels, set up for the next level + * by getting the block number and filling in the cursor. + */ + if (level > 0) { + /* + * If we moved left, need the previous key number, + * unless there isn't one. + */ + if (diff > 0 && --keyno < 1) + keyno = 1; + pp = rops->ptr_addr(cur, keyno, block); + +#ifdef DEBUG + error = bops->xfs_btree_check_ptr(cur, pp, level); + if (error) + goto error0; +#endif + cur->bc_ptrs[level] = keyno; + } + } + /* + * Done with the search. + * See if we need to adjust the results. + */ + if (dir != XFS_LOOKUP_LE && diff < 0) { + keyno++; + /* + * If ge search and we went off the end of the block, but it's + * not the last block, we're in the wrong block. + */ + bops->get_sibling(cur, block, &ptr, XFS_BB_RIGHTSIB); + if (dir == XFS_LOOKUP_GE && + keyno > rops->get_numrecs(cur, block) && + !xfs_btree_ptr_null(cur, &ptr)) { + int i; + + cur->bc_ptrs[0] = keyno; + error = xfs_btree_increment(cur, 0, &i); + if (error) + goto error0; + ASSERT(i == 1); + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + *stat = 1; + return 0; + } + } + else if (dir == XFS_LOOKUP_LE && diff > 0) + keyno--; + cur->bc_ptrs[0] = keyno; + /* + * Return if we succeeded or not. + */ + if (keyno == 0 || keyno > rops->get_numrecs(cur, block)) + *stat = 0; + else + *stat = ((dir != XFS_LOOKUP_EQ) || (diff == 0)); + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + return 0; + +error0: + XFS_BTREE_TRACE_CURSOR(cur, XBT_ERROR); + return error; +} + +/* + * Allocate a new root block, fill it in. + */ +int /* error */ +xfs_btree_newroot( + xfs_btree_cur_t *cur, /* btree cursor */ + int *stat) /* success/failure */ +{ + xfs_btree_block_t *block; /* one half of the old root block */ + xfs_buf_t *bp; /* buffer containing block */ + int error; /* error return value */ + xfs_btree_key_t *kp; /* btree key pointer */ + xfs_buf_t *lbp; /* left buffer pointer */ + xfs_btree_block_t *left; /* left btree block */ + xfs_buf_t *nbp; /* new (root) buffer */ + xfs_btree_block_t *new; /* new (root) btree block */ + int nptr; /* new value for key index, 1 or 2 */ + xfs_btree_ptr_t *pp; /* btree address pointer */ + xfs_buf_t *rbp; /* right buffer pointer */ + xfs_btree_block_t *right; /* right btree block */ + xfs_btree_curops_t *cops = cur->bc_curops; + xfs_btree_blkops_t *bops = cur->bc_blkops; + xfs_btree_recops_t *rops = cur->bc_recops; + xfs_btree_ptr_t rptr; + xfs_btree_ptr_t lptr; + + XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY); + //ASSERT(cur->bc_nlevels < XFS_IN_MAXLEVELS(cur->bc_mp)); // inobt + //ASSERT(cur->bc_nlevels < XFS_AG_MAXLEVELS(cur->bc_mp)); // alloc + + /* + * Get a block & a buffer. + */ + rops->init_ptr_from_cur(cur, &rptr); + + /* + * Allocate the new block. + * If we can't do it, we're toast. Give up. + */ + error = bops->alloc_block(cur, &rptr, &lptr, 1, stat); + if (error) + goto error0; + if (*stat == 0) + goto out0; + + /* + * Set up the new block. + */ + error = bops->get_buf(cur, &lptr, 0, &nbp); + if (error) + goto error0; + new = bops->buf_to_block(cur, nbp); + + /* + * Set the root data in the a.g. inode structure, + * increasing the level by 1. + */ + cops->set_root(cur, &lptr, 1); + + /* + * At the previous root level there are now two blocks: the old + * root, and the new block generated when it was split. + * We don't know which one the cursor is pointing at, so we + * set up variables "left" and "right" for each case. + */ + bp = cur->bc_bufs[cur->bc_nlevels - 1]; + block = bops->buf_to_block(cur, bp); +#ifdef DEBUG + error = bops->check_block(cur, block, cur->bc_nlevels - 1, bp); + if (error) + goto error0; +#endif + bops->get_sibling(cur, block, &rptr, XFS_BB_RIGHTSIB); + if (!xfs_btree_ptr_null(cur, &rptr)) { + /* + * Our block is left, pick up the right block. + */ + lbp = bp; + bops->buf_to_ptr(cur, lbp, &lptr); + left = block; + error = bops->read_buf(cur, &rptr, 0, &rbp); + if (error) + goto error0; + bp = rbp; + right = bops->buf_to_block(cur, rbp); + error = bops->check_block(cur, right, cur->bc_nlevels-1, rbp); + if (error) + goto error0; + nptr = 1; + } else { + /* + * Our block is right, pick up the left block. + */ + rbp = bp; + bops->buf_to_ptr(cur, rbp, &rptr); + right = block; + bops->get_sibling(cur, right, &lptr, XFS_BB_LEFTSIB); + error = bops->read_buf(cur, &lptr, 0, &lbp); + if (error) + goto error0; + bp = lbp; + left = bops->buf_to_block(cur, lbp); + error = bops->check_block(cur, left, cur->bc_nlevels-1, lbp); + if (error) + goto error0; + nptr = 2; + } + /* + * Fill in the new block's btree header and log it. + * XXX: this is 32bit btree specific + */ + new->bb_h.bb_magic = cpu_to_be32(xfs_magics[cur->bc_btnum]); + new->bb_h.bb_level = cpu_to_be16(cur->bc_nlevels); + new->bb_h.bb_numrecs = cpu_to_be16(2); + new->bb_u.s.bb_leftsib = cpu_to_be32(NULLAGBLOCK); + new->bb_u.s.bb_rightsib = cpu_to_be32(NULLAGBLOCK); + bops->log_block(cur, nbp, XFS_BB_ALL_BITS); + ASSERT(!xfs_btree_ptr_null(lp) && !xfs_btree_ptr_null(rp)); + + /* + * Fill in the key data in the new root. + */ + kp = rops->key_addr(cur, 1, new); + if (be16_to_cpu(left->bb_h.bb_level) > 0) { + rops->set_key(cur, kp, 0, rops->key_addr(cur, 1, left)); + rops->set_key(cur, kp, 1, rops->key_addr(cur, 1, right)); + } else { + rops->init_key_from_rec(cur, kp, rops->rec_addr(cur, 1, left)); + kp = rops->key_addr(cur, 2, new); + rops->init_key_from_rec(cur, kp, rops->rec_addr(cur, 1, right)); + } + rops->log_keys(cur, nbp, 1, 2); + /* + * Fill in the pointer data in the new root. + */ + pp = rops->ptr_addr(cur, 1, new); + rops->set_ptr(cur, pp, 0, &lptr); + rops->set_ptr(cur, pp, 1, &rptr); + rops->log_ptrs(cur, nbp, 1, 2); + /* + * Fix up the cursor. + */ + xfs_btree_setbuf(cur, cur->bc_nlevels, nbp); + cur->bc_ptrs[cur->bc_nlevels] = nptr; + cur->bc_nlevels++; + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + *stat = 1; + return 0; +error0: + XFS_BTREE_TRACE_CURSOR(cur, XBT_ERROR); + return error; +out0: + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + *stat = 0; + return 0; +} + +/* + * Update the record referred to by cur to the value in the + * given record. This either works (return 0) or gets an + * EFSCORRUPTED error. + */ +int +xfs_btree_update( + xfs_btree_cur_t *cur, + xfs_btree_rec_t *rec) +{ + xfs_btree_block_t *block; + xfs_buf_t *bp; + int error; + int ptr; + xfs_btree_rec_t *rp; + xfs_btree_curops_t *cops = cur->bc_curops; + xfs_btree_blkops_t *bops = cur->bc_blkops; + xfs_btree_recops_t *rops = cur->bc_recops; + + XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY); + //XFS_BTREE_TRACE_ARGR(cur, rec); + + /* + * Pick up the current block. + */ + block = bops->get_block(cur, 0, &bp); +#ifdef DEBUG + error = bops->check_block(cur, block, 0, bp); + if (error) + goto error0; +#endif + /* + * Get the address of the rec to be updated. + */ + ptr = cur->bc_ptrs[0]; + rp = rops->rec_addr(cur, ptr, block); + /* + * Fill in the new contents and log them. + */ + rops->move_recs(cur, rec, rp, 0, 0, 1); + rops->log_recs(cur, bp, ptr, ptr); + /* + * If we are tracking the last record in the tree and + * we are at the far right edge of the tree, update it. + */ + if (xfs_btree_is_lastrec(cur, block, 0, ptr)) { + error = cops->update_lastrec(cur, block); + if (error) + goto error0; + } + + /* + * Updating first record in leaf. Pass new key value up to our parent. + */ + if (ptr == 1) { + xfs_btree_key_t key; + + rops->init_key_from_rec(cur, &key, rec); + error = xfs_btree_updkey(cur, &key, 1); + if (error) + goto error0; + } + + XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); + return 0; + +error0: + XFS_BTREE_TRACE_CURSOR(cur, XBT_ERROR); + return error; +} + Index: 2.6.x-xfs-new/fs/xfs/xfs_btree_trace.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ 2.6.x-xfs-new/fs/xfs/xfs_btree_trace.c 2007-11-06 19:40:29.758667866 +1100 @@ -0,0 +1,202 @@ +/* + * Copyright (c) 2000-2001,2005 Silicon Graphics, Inc. + * All Rights Reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it would be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write the Free Software Foundation, + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_types.h" +#include "xfs_bit.h" +#include "xfs_log.h" +#include "xfs_inum.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_ag.h" +#include "xfs_dir2.h" +#include "xfs_dmapi.h" +#include "xfs_mount.h" +#include "xfs_bmap_btree.h" +#include "xfs_alloc_btree.h" +#include "xfs_ialloc_btree.h" +#include "xfs_dir2_sf.h" +#include "xfs_attr_sf.h" +#include "xfs_dinode.h" +#include "xfs_inode.h" +#include "xfs_btree.h" +#include "xfs_ialloc.h" +#include "xfs_alloc.h" +#include "xfs_error.h" + +#if defined(XFS_BTREE_TRACE) + +/* + * Add a trace buffer entry for arguments, for one integer arg. + */ +STATIC void +xfs_btree_trace_argi( + const char *func, + xfs_btree_cur_t *cur, + int i, + int line) +{ + cur->bc_trcops->trace(func, cur, XBT_ARGS, XFS_BTREE_KTRACE_ARGI, line, + i, 0, 0, 0, + 0, 0, 0, 0, + 0, 0, 0); +} + +/* + * Add a trace buffer entry for arguments, for a buffer & 1 integer arg. + */ +STATIC void +xfs_btree_trace_argbi( + const char *func, + xfs_btree_cur_t *cur, + xfs_buf_t *b, + int i, + int line) +{ + cur->bc_trcops->trace(func, cur, XBT_ARGS, XFS_BTREE_KTRACE_ARGBI, line, + (__psunsigned_t)b, i, 0, 0, + 0, 0, 0, 0, + 0, 0, 0); +} + +/* + * Add a trace buffer entry for arguments, for a buffer & 2 integer args. + */ +STATIC void +xfs_btree_trace_argbii( + const char *func, + xfs_btree_cur_t *cur, + xfs_buf_t *b, + int i0, + int i1, + int line) +{ + cur->bc_trcops->trace(func, cur, XBT_ARGS, XFS_BTREE_KTRACE_ARGBII, line, + (__psunsigned_t)b, i0, i1, 0, + 0, 0, 0, 0, + 0, 0, 0); +} + +/* + * Add a trace buffer entry for arguments, for int, ptr, key. + */ +STATIC void +xfs_btree_trace_argipk( + const char *func, + xfs_btree_cur_t *cur, + int i, + xfs_btree_ptr_t *p, + xfs_btree_key_t *k, + int line) +{ + __uint64_t v = 0, u = 0; + if (XFS_BTREE_LONG_PTRS(cur->bc_btnum)) { + u = be64_to_cpu(p->u.l); + v = be64_to_cpu(k->u.l); + } else { + u = be32_to_cpu(p->u.s); + v = be32_to_cpu(k->u.s); + } + + cur->bc_trcops->trace(func, cur, XBT_ARGS, XFS_BTREE_KTRACE_ARGIPK, + line, i, u >> 32, (int)u, + v >> 32, (int)v, 0, 0, 0, + 0, 0, 0); +} + +/* + * Add a trace buffer entry for arguments, for int, ptr, rec. + */ +STATIC void +xfs_btree_trace_argipr( + const char *func, + xfs_btree_cur_t *cur, + int i, + xfs_btree_ptr_t *p, + xfs_btree_rec_t *r, + int line) +{ + __uint64_t l0 = 0, l1 = 0, l2 = 0; + __uint64_t d; + + if (XFS_BTREE_LONG_PTRS(cur->bc_btnum)) + d = be64_to_cpu(p->u.l); + else + d = be32_to_cpu(p->u.s); + + if (cur->bc_trcops->record) + cur->bc_trcops->record(cur, r, &l0, &l1, &l2); + + cur->bc_trcops->trace(func, cur, XBT_ARGS, XFS_BTREE_KTRACE_ARGIPR, line, + i, d >> 32, (int)d, l0 >> 32, + (int)l0, l1 >> 32, (int)l1, l2 >> 32, + (int)l2, 0, 0); +} + +/* + * Add a trace buffer entry for arguments, for int, key. + */ +STATIC void +xfs_btree_trace_argik( + const char *func, + xfs_btree_cur_t *cur, + int i, + xfs_btree_key_t *k, + int line) +{ + __uint64_t v = 0; + + if (XFS_BTREE_LONG_PTRS(cur->bc_btnum)) + v = be64_to_cpu(k->u.l); + else + v = be32_to_cpu(k->u.s); + + cur->bc_trcops->trace(func, cur, XBT_ARGS, XFS_BTREE_KTRACE_ARGIPK, line, + i, 0, 0, v >> 32, (int)v, + 0, 0, 0, + 0, 0, 0); +} + +/* + * Add a trace buffer entry for the cursor/operation. + */ +STATIC void +xfs_btree_trace_cursor( + const char *func, + xfs_btree_cur_t *cur, + char *s, + int line) +{ + __uint32_t s0 = 0; + __uint64_t l0 = 0, l1 = 0; + + if (cur->bc_trcops->cursor) + cur->bc_trcops->cursor(cur, &s0, &l0, &l1); + + cur->bc_trcops->enter(func, cur, s, XFS_BTREE_KTRACE_CUR, line, + (cur->bc_nlevels << 24) | s0, + l0 >> 32, (int)l0, + l1 >> 32, (int)l1, + (unsigned long)cur->bc_bufs[0], (unsigned long)cur->bc_bufs[1], + (unsigned long)cur->bc_bufs[2], (unsigned long)cur->bc_bufs[3], + (cur->bc_ptrs[0] << 16) | cur->bc_ptrs[1], + (cur->bc_ptrs[2] << 16) | cur->bc_ptrs[3]); +} + + + Index: 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_ialloc.c 2007-10-16 08:52:58.000000000 +1000 +++ 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c 2007-11-06 19:40:29.762667351 +1100 @@ -322,7 +322,7 @@ xfs_ialloc_ag_alloc( return error; } ASSERT(i == 0); - if ((error = xfs_inobt_insert(cur, &i))) { + if ((error = xfs_btree_insert(cur, &i))) { xfs_btree_del_cursor(cur, XFS_BTREE_ERROR); return error; } @@ -673,7 +673,7 @@ nextag: goto error0; XFS_WANT_CORRUPTED_GOTO(i == 1, error0); freecount += rec.ir_freecount; - if ((error = xfs_inobt_increment(cur, 0, &i))) + if ((error = xfs_btree_increment(cur, 0, &i))) goto error0; } while (i == 1); @@ -717,7 +717,7 @@ nextag: /* * Search left with tcur, back up 1 record. */ - if ((error = xfs_inobt_decrement(tcur, 0, &i))) + if ((error = xfs_btree_decrement(tcur, 0, &i))) goto error1; doneleft = !i; if (!doneleft) { @@ -731,7 +731,7 @@ nextag: /* * Search right with cur, go forward 1 record. */ - if ((error = xfs_inobt_increment(cur, 0, &i))) + if ((error = xfs_btree_increment(cur, 0, &i))) goto error1; doneright = !i; if (!doneright) { @@ -793,7 +793,7 @@ nextag: * further left. */ if (useleft) { - if ((error = xfs_inobt_decrement(tcur, 0, + if ((error = xfs_btree_decrement(tcur, 0, &i))) goto error1; doneleft = !i; @@ -813,7 +813,7 @@ nextag: * further right. */ else { - if ((error = xfs_inobt_increment(cur, 0, + if ((error = xfs_btree_increment(cur, 0, &i))) goto error1; doneright = !i; @@ -868,7 +868,7 @@ nextag: XFS_WANT_CORRUPTED_GOTO(i == 1, error0); if (rec.ir_freecount > 0) break; - if ((error = xfs_inobt_increment(cur, 0, &i))) + if ((error = xfs_btree_increment(cur, 0, &i))) goto error0; XFS_WANT_CORRUPTED_GOTO(i == 1, error0); } @@ -902,7 +902,7 @@ nextag: goto error0; XFS_WANT_CORRUPTED_GOTO(i == 1, error0); freecount += rec.ir_freecount; - if ((error = xfs_inobt_increment(cur, 0, &i))) + if ((error = xfs_btree_increment(cur, 0, &i))) goto error0; } while (i == 1); ASSERT(freecount == be32_to_cpu(agi->agi_freecount) || @@ -1012,7 +1012,7 @@ xfs_difree( goto error0; if (i) { freecount += rec.ir_freecount; - if ((error = xfs_inobt_increment(cur, 0, &i))) + if ((error = xfs_btree_increment(cur, 0, &i))) goto error0; } } while (i == 1); @@ -1074,8 +1074,8 @@ xfs_difree( xfs_trans_mod_sb(tp, XFS_TRANS_SB_ICOUNT, -ilen); xfs_trans_mod_sb(tp, XFS_TRANS_SB_IFREE, -(ilen - 1)); - if ((error = xfs_inobt_delete(cur, &i))) { - cmn_err(CE_WARN, "xfs_difree: xfs_inobt_delete returned an error %d on %s.\n", + if ((error = xfs_btree_delete(cur, &i))) { + cmn_err(CE_WARN, "xfs_difree: xfs_btree_delete returned an error %d on %s.\n", error, mp->m_fsname); goto error0; } @@ -1117,7 +1117,7 @@ xfs_difree( goto error0; if (i) { freecount += rec.ir_freecount; - if ((error = xfs_inobt_increment(cur, 0, &i))) + if ((error = xfs_btree_increment(cur, 0, &i))) goto error0; } } while (i == 1); Index: 2.6.x-xfs-new/fs/xfs/xfs_ialloc_btree.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_ialloc_btree.c 2007-06-05 22:12:50.000000000 +1000 +++ 2.6.x-xfs-new/fs/xfs/xfs_ialloc_btree.c 2007-11-06 19:40:29.770666321 +1100 @@ -39,711 +39,132 @@ #include "xfs_alloc.h" #include "xfs_error.h" -STATIC void xfs_inobt_log_block(xfs_trans_t *, xfs_buf_t *, int); -STATIC void xfs_inobt_log_keys(xfs_btree_cur_t *, xfs_buf_t *, int, int); -STATIC void xfs_inobt_log_ptrs(xfs_btree_cur_t *, xfs_buf_t *, int, int); -STATIC void xfs_inobt_log_recs(xfs_btree_cur_t *, xfs_buf_t *, int, int); -STATIC int xfs_inobt_lshift(xfs_btree_cur_t *, int, int *); -STATIC int xfs_inobt_newroot(xfs_btree_cur_t *, int *); -STATIC int xfs_inobt_rshift(xfs_btree_cur_t *, int, int *); -STATIC int xfs_inobt_split(xfs_btree_cur_t *, int, xfs_agblock_t *, - xfs_inobt_key_t *, xfs_btree_cur_t **, int *); -STATIC int xfs_inobt_updkey(xfs_btree_cur_t *, xfs_inobt_key_t *, int); /* - * Single level of the xfs_inobt_delete record deletion routine. - * Delete record pointed to by cur/level. - * Remove the record from its block then rebalance the tree. - * Return 0 for error, 1 for done, 2 to go on to the next level. + * Get the block pointer for the given level of the cursor. + * Fill in the buffer pointer, if applicable. */ -STATIC int /* error */ -xfs_inobt_delrec( - xfs_btree_cur_t *cur, /* btree cursor */ - int level, /* level removing record from */ - int *stat) /* fail/done/go-on */ +STATIC xfs_btree_block_t * +xfs_inobt_get_block( + xfs_btree_cur_t *cur, + int level, + xfs_buf_t **bpp) { - xfs_buf_t *agbp; /* buffer for a.g. inode header */ - xfs_mount_t *mp; /* mount structure */ - xfs_agi_t *agi; /* allocation group inode header */ - xfs_inobt_block_t *block; /* btree block record/key lives in */ - xfs_agblock_t bno; /* btree block number */ - xfs_buf_t *bp; /* buffer for block */ - int error; /* error return value */ - int i; /* loop index */ - xfs_inobt_key_t key; /* kp points here if block is level 0 */ - xfs_inobt_key_t *kp = NULL; /* pointer to btree keys */ - xfs_agblock_t lbno; /* left block's block number */ - xfs_buf_t *lbp; /* left block's buffer pointer */ - xfs_inobt_block_t *left; /* left btree block */ - xfs_inobt_key_t *lkp; /* left block key pointer */ - xfs_inobt_ptr_t *lpp; /* left block address pointer */ - int lrecs = 0; /* number of records in left block */ - xfs_inobt_rec_t *lrp; /* left block record pointer */ - xfs_inobt_ptr_t *pp = NULL; /* pointer to btree addresses */ - int ptr; /* index in btree block for this rec */ - xfs_agblock_t rbno; /* right block's block number */ - xfs_buf_t *rbp; /* right block's buffer pointer */ - xfs_inobt_block_t *right; /* right btree block */ - xfs_inobt_key_t *rkp; /* right block key pointer */ - xfs_inobt_rec_t *rp; /* pointer to btree records */ - xfs_inobt_ptr_t *rpp; /* right block address pointer */ - int rrecs = 0; /* number of records in right block */ - int numrecs; - xfs_inobt_rec_t *rrp; /* right block record pointer */ - xfs_btree_cur_t *tcur; /* temporary btree cursor */ + ASSERT(level < cur->bc_nlevels); + *bpp = cur->bc_bufs[level]; + return (xfs_btree_block_t *)XFS_BUF_TO_INOBT_BLOCK(*bpp); +} - mp = cur->bc_mp; - /* - * Get the index of the entry being deleted, check for nothing there. - */ - ptr = cur->bc_ptrs[level]; - if (ptr == 0) { - *stat = 0; - return 0; - } - - /* - * Get the buffer & block containing the record or key/ptr. - */ - bp = cur->bc_bufs[level]; - block = XFS_BUF_TO_INOBT_BLOCK(bp); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, block, level, bp))) - return error; -#endif - /* - * Fail if we're off the end of the block. - */ +STATIC int +xfs_inobt_get_buf( + xfs_btree_cur_t *cur, + xfs_btree_ptr_t *ptr, + int flags, + xfs_buf_t **bpp) +{ + xfs_buf_t *bp; - numrecs = be16_to_cpu(block->bb_numrecs); - if (ptr > numrecs) { - *stat = 0; - return 0; - } - /* - * It's a nonleaf. Excise the key and ptr being deleted, by - * sliding the entries past them down one. - * Log the changed areas of the block. - */ - if (level > 0) { - kp = XFS_INOBT_KEY_ADDR(block, 1, cur); - pp = XFS_INOBT_PTR_ADDR(block, 1, cur); -#ifdef DEBUG - for (i = ptr; i < numrecs; i++) { - if ((error = xfs_btree_check_sptr(cur, be32_to_cpu(pp[i]), level))) - return error; - } -#endif - if (ptr < numrecs) { - memmove(&kp[ptr - 1], &kp[ptr], - (numrecs - ptr) * sizeof(*kp)); - memmove(&pp[ptr - 1], &pp[ptr], - (numrecs - ptr) * sizeof(*kp)); - xfs_inobt_log_keys(cur, bp, ptr, numrecs - 1); - xfs_inobt_log_ptrs(cur, bp, ptr, numrecs - 1); - } - } - /* - * It's a leaf. Excise the record being deleted, by sliding the - * entries past it down one. Log the changed areas of the block. - */ - else { - rp = XFS_INOBT_REC_ADDR(block, 1, cur); - if (ptr < numrecs) { - memmove(&rp[ptr - 1], &rp[ptr], - (numrecs - ptr) * sizeof(*rp)); - xfs_inobt_log_recs(cur, bp, ptr, numrecs - 1); - } - /* - * If it's the first record in the block, we'll need a key - * structure to pass up to the next level (updkey). - */ - if (ptr == 1) { - key.ir_startino = rp->ir_startino; - kp = &key; - } - } - /* - * Decrement and log the number of entries in the block. - */ - numrecs--; - block->bb_numrecs = cpu_to_be16(numrecs); - xfs_inobt_log_block(cur->bc_tp, bp, XFS_BB_NUMRECS); - /* - * Is this the root level? If so, we're almost done. - */ - if (level == cur->bc_nlevels - 1) { - /* - * If this is the root level, - * and there's only one entry left, - * and it's NOT the leaf level, - * then we can get rid of this level. - */ - if (numrecs == 1 && level > 0) { - agbp = cur->bc_private.i.agbp; - agi = XFS_BUF_TO_AGI(agbp); - /* - * pp is still set to the first pointer in the block. - * Make it the new root of the btree. - */ - bno = be32_to_cpu(agi->agi_root); - agi->agi_root = *pp; - be32_add(&agi->agi_level, -1); - /* - * Free the block. - */ - if ((error = xfs_free_extent(cur->bc_tp, - XFS_AGB_TO_FSB(mp, cur->bc_private.i.agno, bno), 1))) - return error; - xfs_trans_binval(cur->bc_tp, bp); - xfs_ialloc_log_agi(cur->bc_tp, agbp, - XFS_AGI_ROOT | XFS_AGI_LEVEL); - /* - * Update the cursor so there's one fewer level. - */ - cur->bc_bufs[level] = NULL; - cur->bc_nlevels--; - } else if (level > 0 && - (error = xfs_inobt_decrement(cur, level, &i))) - return error; - *stat = 1; - return 0; - } - /* - * If we deleted the leftmost entry in the block, update the - * key values above us in the tree. - */ - if (ptr == 1 && (error = xfs_inobt_updkey(cur, kp, level + 1))) - return error; - /* - * If the number of records remaining in the block is at least - * the minimum, we're done. - */ - if (numrecs >= XFS_INOBT_BLOCK_MINRECS(level, cur)) { - if (level > 0 && - (error = xfs_inobt_decrement(cur, level, &i))) - return error; - *stat = 1; - return 0; - } - /* - * Otherwise, we have to move some records around to keep the - * tree balanced. Look at the left and right sibling blocks to - * see if we can re-balance by moving only one record. - */ - rbno = be32_to_cpu(block->bb_rightsib); - lbno = be32_to_cpu(block->bb_leftsib); - bno = NULLAGBLOCK; - ASSERT(rbno != NULLAGBLOCK || lbno != NULLAGBLOCK); - /* - * Duplicate the cursor so our btree manipulations here won't - * disrupt the next level up. - */ - if ((error = xfs_btree_dup_cursor(cur, &tcur))) - return error; - /* - * If there's a right sibling, see if it's ok to shift an entry - * out of it. - */ - if (rbno != NULLAGBLOCK) { - /* - * Move the temp cursor to the last entry in the next block. - * Actually any entry but the first would suffice. - */ - i = xfs_btree_lastrec(tcur, level); - XFS_WANT_CORRUPTED_GOTO(i == 1, error0); - if ((error = xfs_inobt_increment(tcur, level, &i))) - goto error0; - XFS_WANT_CORRUPTED_GOTO(i == 1, error0); - i = xfs_btree_lastrec(tcur, level); - XFS_WANT_CORRUPTED_GOTO(i == 1, error0); - /* - * Grab a pointer to the block. - */ - rbp = tcur->bc_bufs[level]; - right = XFS_BUF_TO_INOBT_BLOCK(rbp); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, right, level, rbp))) - goto error0; -#endif - /* - * Grab the current block number, for future use. - */ - bno = be32_to_cpu(right->bb_leftsib); - /* - * If right block is full enough so that removing one entry - * won't make it too empty, and left-shifting an entry out - * of right to us works, we're done. - */ - if (be16_to_cpu(right->bb_numrecs) - 1 >= - XFS_INOBT_BLOCK_MINRECS(level, cur)) { - if ((error = xfs_inobt_lshift(tcur, level, &i))) - goto error0; - if (i) { - ASSERT(be16_to_cpu(block->bb_numrecs) >= - XFS_INOBT_BLOCK_MINRECS(level, cur)); - xfs_btree_del_cursor(tcur, - XFS_BTREE_NOERROR); - if (level > 0 && - (error = xfs_inobt_decrement(cur, level, - &i))) - return error; - *stat = 1; - return 0; - } - } - /* - * Otherwise, grab the number of records in right for - * future reference, and fix up the temp cursor to point - * to our block again (last record). - */ - rrecs = be16_to_cpu(right->bb_numrecs); - if (lbno != NULLAGBLOCK) { - xfs_btree_firstrec(tcur, level); - if ((error = xfs_inobt_decrement(tcur, level, &i))) - goto error0; - } - } - /* - * If there's a left sibling, see if it's ok to shift an entry - * out of it. - */ - if (lbno != NULLAGBLOCK) { - /* - * Move the temp cursor to the first entry in the - * previous block. - */ - xfs_btree_firstrec(tcur, level); - if ((error = xfs_inobt_decrement(tcur, level, &i))) - goto error0; - xfs_btree_firstrec(tcur, level); - /* - * Grab a pointer to the block. - */ - lbp = tcur->bc_bufs[level]; - left = XFS_BUF_TO_INOBT_BLOCK(lbp); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, left, level, lbp))) - goto error0; -#endif - /* - * Grab the current block number, for future use. - */ - bno = be32_to_cpu(left->bb_rightsib); - /* - * If left block is full enough so that removing one entry - * won't make it too empty, and right-shifting an entry out - * of left to us works, we're done. - */ - if (be16_to_cpu(left->bb_numrecs) - 1 >= - XFS_INOBT_BLOCK_MINRECS(level, cur)) { - if ((error = xfs_inobt_rshift(tcur, level, &i))) - goto error0; - if (i) { - ASSERT(be16_to_cpu(block->bb_numrecs) >= - XFS_INOBT_BLOCK_MINRECS(level, cur)); - xfs_btree_del_cursor(tcur, - XFS_BTREE_NOERROR); - if (level == 0) - cur->bc_ptrs[0]++; - *stat = 1; - return 0; - } - } - /* - * Otherwise, grab the number of records in right for - * future reference. - */ - lrecs = be16_to_cpu(left->bb_numrecs); - } - /* - * Delete the temp cursor, we're done with it. - */ - xfs_btree_del_cursor(tcur, XFS_BTREE_NOERROR); - /* - * If here, we need to do a join to keep the tree balanced. - */ - ASSERT(bno != NULLAGBLOCK); - /* - * See if we can join with the left neighbor block. - */ - if (lbno != NULLAGBLOCK && - lrecs + numrecs <= XFS_INOBT_BLOCK_MAXRECS(level, cur)) { - /* - * Set "right" to be the starting block, - * "left" to be the left neighbor. - */ - rbno = bno; - right = block; - rrecs = be16_to_cpu(right->bb_numrecs); - rbp = bp; - if ((error = xfs_btree_read_bufs(mp, cur->bc_tp, - cur->bc_private.i.agno, lbno, 0, &lbp, - XFS_INO_BTREE_REF))) - return error; - left = XFS_BUF_TO_INOBT_BLOCK(lbp); - lrecs = be16_to_cpu(left->bb_numrecs); - if ((error = xfs_btree_check_sblock(cur, left, level, lbp))) - return error; - } - /* - * If that won't work, see if we can join with the right neighbor block. - */ - else if (rbno != NULLAGBLOCK && - rrecs + numrecs <= XFS_INOBT_BLOCK_MAXRECS(level, cur)) { - /* - * Set "left" to be the starting block, - * "right" to be the right neighbor. - */ - lbno = bno; - left = block; - lrecs = be16_to_cpu(left->bb_numrecs); - lbp = bp; - if ((error = xfs_btree_read_bufs(mp, cur->bc_tp, - cur->bc_private.i.agno, rbno, 0, &rbp, - XFS_INO_BTREE_REF))) - return error; - right = XFS_BUF_TO_INOBT_BLOCK(rbp); - rrecs = be16_to_cpu(right->bb_numrecs); - if ((error = xfs_btree_check_sblock(cur, right, level, rbp))) - return error; - } - /* - * Otherwise, we can't fix the imbalance. - * Just return. This is probably a logic error, but it's not fatal. - */ - else { - if (level > 0 && (error = xfs_inobt_decrement(cur, level, &i))) - return error; - *stat = 1; - return 0; - } - /* - * We're now going to join "left" and "right" by moving all the stuff - * in "right" to "left" and deleting "right". - */ - if (level > 0) { - /* - * It's a non-leaf. Move keys and pointers. - */ - lkp = XFS_INOBT_KEY_ADDR(left, lrecs + 1, cur); - lpp = XFS_INOBT_PTR_ADDR(left, lrecs + 1, cur); - rkp = XFS_INOBT_KEY_ADDR(right, 1, cur); - rpp = XFS_INOBT_PTR_ADDR(right, 1, cur); -#ifdef DEBUG - for (i = 0; i < rrecs; i++) { - if ((error = xfs_btree_check_sptr(cur, be32_to_cpu(rpp[i]), level))) - return error; - } -#endif - memcpy(lkp, rkp, rrecs * sizeof(*lkp)); - memcpy(lpp, rpp, rrecs * sizeof(*lpp)); - xfs_inobt_log_keys(cur, lbp, lrecs + 1, lrecs + rrecs); - xfs_inobt_log_ptrs(cur, lbp, lrecs + 1, lrecs + rrecs); - } else { - /* - * It's a leaf. Move records. - */ - lrp = XFS_INOBT_REC_ADDR(left, lrecs + 1, cur); - rrp = XFS_INOBT_REC_ADDR(right, 1, cur); - memcpy(lrp, rrp, rrecs * sizeof(*lrp)); - xfs_inobt_log_recs(cur, lbp, lrecs + 1, lrecs + rrecs); - } - /* - * If we joined with the left neighbor, set the buffer in the - * cursor to the left block, and fix up the index. - */ - if (bp != lbp) { - xfs_btree_setbuf(cur, level, lbp); - cur->bc_ptrs[level] += lrecs; - } - /* - * If we joined with the right neighbor and there's a level above - * us, increment the cursor at that level. - */ - else if (level + 1 < cur->bc_nlevels && - (error = xfs_alloc_increment(cur, level + 1, &i))) - return error; - /* - * Fix up the number of records in the surviving block. - */ - lrecs += rrecs; - left->bb_numrecs = cpu_to_be16(lrecs); - /* - * Fix up the right block pointer in the surviving block, and log it. - */ - left->bb_rightsib = right->bb_rightsib; - xfs_inobt_log_block(cur->bc_tp, lbp, XFS_BB_NUMRECS | XFS_BB_RIGHTSIB); - /* - * If there is a right sibling now, make it point to the - * remaining block. - */ - if (be32_to_cpu(left->bb_rightsib) != NULLAGBLOCK) { - xfs_inobt_block_t *rrblock; - xfs_buf_t *rrbp; - - if ((error = xfs_btree_read_bufs(mp, cur->bc_tp, - cur->bc_private.i.agno, be32_to_cpu(left->bb_rightsib), 0, - &rrbp, XFS_INO_BTREE_REF))) - return error; - rrblock = XFS_BUF_TO_INOBT_BLOCK(rrbp); - if ((error = xfs_btree_check_sblock(cur, rrblock, level, rrbp))) - return error; - rrblock->bb_leftsib = cpu_to_be32(lbno); - xfs_inobt_log_block(cur->bc_tp, rrbp, XFS_BB_LEFTSIB); - } - /* - * Free the deleting block. - */ - if ((error = xfs_free_extent(cur->bc_tp, XFS_AGB_TO_FSB(mp, - cur->bc_private.i.agno, rbno), 1))) - return error; - xfs_trans_binval(cur->bc_tp, rbp); - /* - * Readjust the ptr at this level if it's not a leaf, since it's - * still pointing at the deletion point, which makes the cursor - * inconsistent. If this makes the ptr 0, the caller fixes it up. - * We can't use decrement because it would change the next level up. - */ - if (level > 0) - cur->bc_ptrs[level]--; - /* - * Return value means the next level up has something to do. - */ - *stat = 2; + bp = xfs_btree_get_bufs(cur->bc_mp, cur->bc_tp, cur->bc_private.i.agno, + be32_to_cpu(ptr->u.inobt), flags); + *bpp = bp; return 0; -error0: - xfs_btree_del_cursor(tcur, XFS_BTREE_ERROR); - return error; } -/* - * Insert one record/level. Return information to the caller - * allowing the next level up to proceed if necessary. - */ -STATIC int /* error */ -xfs_inobt_insrec( - xfs_btree_cur_t *cur, /* btree cursor */ - int level, /* level to insert record at */ - xfs_agblock_t *bnop, /* i/o: block number inserted */ - xfs_inobt_rec_t *recp, /* i/o: record data inserted */ - xfs_btree_cur_t **curp, /* output: new cursor replacing cur */ - int *stat) /* success/failure */ +STATIC int +xfs_inobt_read_buf( + xfs_btree_cur_t *cur, + xfs_btree_ptr_t *ptr, + int flags, + xfs_buf_t **bpp) { - xfs_inobt_block_t *block; /* btree block record/key lives in */ - xfs_buf_t *bp; /* buffer for block */ - int error; /* error return value */ - int i; /* loop index */ - xfs_inobt_key_t key; /* key value being inserted */ - xfs_inobt_key_t *kp=NULL; /* pointer to btree keys */ - xfs_agblock_t nbno; /* block number of allocated block */ - xfs_btree_cur_t *ncur; /* new cursor to be used at next lvl */ - xfs_inobt_key_t nkey; /* new key value, from split */ - xfs_inobt_rec_t nrec; /* new record value, for caller */ - int numrecs; - int optr; /* old ptr value */ - xfs_inobt_ptr_t *pp; /* pointer to btree addresses */ - int ptr; /* index in btree block for this rec */ - xfs_inobt_rec_t *rp=NULL; /* pointer to btree records */ + return xfs_btree_read_bufs(cur->bc_mp, + cur->bc_tp, cur->bc_private.i.agno, + be32_to_cpu(ptr->u.inobt), flags, + bpp, XFS_INO_BTREE_REF); +} - /* - * GCC doesn't understand the (arguably complex) control flow in - * this function and complains about uninitialized structure fields - * without this. - */ - memset(&nrec, 0, sizeof(nrec)); +STATIC xfs_btree_block_t * +xfs_inobt_buf_to_block( + xfs_btree_cur_t *cur, + xfs_buf_t *bp) +{ + /* XFS_BUF_TO_INOBT_BLOCK(rbp); */ + return XFS_BUF_TO_BLOCK(bp); +} - /* - * If we made it to the root level, allocate a new root block - * and we're done. - */ - if (level >= cur->bc_nlevels) { - error = xfs_inobt_newroot(cur, &i); - *bnop = NULLAGBLOCK; - *stat = i; +STATIC void +xfs_inobt_buf_to_ptr( + xfs_btree_cur_t *cur, + xfs_buf_t *bp, + xfs_btree_ptr_t *ptr) +{ + ptr->u.inobt = cpu_to_be32(XFS_DADDR_TO_AGBNO(cur->bc_mp, XFS_BUF_ADDR(bp))); +} + +STATIC int +xfs_inobt_alloc_block( + xfs_btree_cur_t *cur, + xfs_btree_ptr_t *start, + xfs_btree_ptr_t *new, + int length, + int *stat) +{ + xfs_alloc_arg_t args; /* block allocation args */ + int error; /* error return value */ + xfs_agblock_t sbno = be32_to_cpu(start->u.inobt); + + XFS_BTREE_TRACE_CURSOR(cur, ENTRY); + memset(&args, 0, sizeof(args)); + args.tp = cur->bc_tp; + args.mp = cur->bc_mp; + args.fsbno = XFS_AGB_TO_FSB(args.mp, cur->bc_private.i.agno, sbno); + args.mod = args.minleft = args.alignment = args.total = args.wasdel = + args.isfl = args.userdata = args.minalignslop = 0; + args.minlen = args.maxlen = args.prod = 1; + args.type = XFS_ALLOCTYPE_NEAR_BNO; + + error = xfs_alloc_vextent(&args); + if (error) { + XFS_BTREE_TRACE_CURSOR(cur, ERROR); return error; } - /* - * Make a key out of the record data to be inserted, and save it. - */ - key.ir_startino = recp->ir_startino; - optr = ptr = cur->bc_ptrs[level]; - /* - * If we're off the left edge, return failure. - */ - if (ptr == 0) { + if (args.fsbno == NULLFSBLOCK) { + XFS_BTREE_TRACE_CURSOR(cur, EXIT); *stat = 0; return 0; } - /* - * Get pointers to the btree buffer and block. - */ - bp = cur->bc_bufs[level]; - block = XFS_BUF_TO_INOBT_BLOCK(bp); - numrecs = be16_to_cpu(block->bb_numrecs); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, block, level, bp))) - return error; - /* - * Check that the new entry is being inserted in the right place. - */ - if (ptr <= numrecs) { - if (level == 0) { - rp = XFS_INOBT_REC_ADDR(block, ptr, cur); - xfs_btree_check_rec(cur->bc_btnum, recp, rp); - } else { - kp = XFS_INOBT_KEY_ADDR(block, ptr, cur); - xfs_btree_check_key(cur->bc_btnum, &key, kp); - } - } -#endif - nbno = NULLAGBLOCK; - ncur = NULL; - /* - * If the block is full, we can't insert the new entry until we - * make the block un-full. - */ - if (numrecs == XFS_INOBT_BLOCK_MAXRECS(level, cur)) { - /* - * First, try shifting an entry to the right neighbor. - */ - if ((error = xfs_inobt_rshift(cur, level, &i))) - return error; - if (i) { - /* nothing */ - } - /* - * Next, try shifting an entry to the left neighbor. - */ - else { - if ((error = xfs_inobt_lshift(cur, level, &i))) - return error; - if (i) { - optr = ptr = cur->bc_ptrs[level]; - } else { - /* - * Next, try splitting the current block - * in half. If this works we have to - * re-set our variables because - * we could be in a different block now. - */ - if ((error = xfs_inobt_split(cur, level, &nbno, - &nkey, &ncur, &i))) - return error; - if (i) { - bp = cur->bc_bufs[level]; - block = XFS_BUF_TO_INOBT_BLOCK(bp); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, - block, level, bp))) - return error; -#endif - ptr = cur->bc_ptrs[level]; - nrec.ir_startino = nkey.ir_startino; - } else { - /* - * Otherwise the insert fails. - */ - *stat = 0; - return 0; - } - } - } - } - /* - * At this point we know there's room for our new entry in the block - * we're pointing at. - */ - numrecs = be16_to_cpu(block->bb_numrecs); - if (level > 0) { - /* - * It's a non-leaf entry. Make a hole for the new data - * in the key and ptr regions of the block. - */ - kp = XFS_INOBT_KEY_ADDR(block, 1, cur); - pp = XFS_INOBT_PTR_ADDR(block, 1, cur); -#ifdef DEBUG - for (i = numrecs; i >= ptr; i--) { - if ((error = xfs_btree_check_sptr(cur, be32_to_cpu(pp[i - 1]), level))) - return error; - } -#endif - memmove(&kp[ptr], &kp[ptr - 1], - (numrecs - ptr + 1) * sizeof(*kp)); - memmove(&pp[ptr], &pp[ptr - 1], - (numrecs - ptr + 1) * sizeof(*pp)); - /* - * Now stuff the new data in, bump numrecs and log the new data. - */ -#ifdef DEBUG - if ((error = xfs_btree_check_sptr(cur, *bnop, level))) - return error; -#endif - kp[ptr - 1] = key; - pp[ptr - 1] = cpu_to_be32(*bnop); - numrecs++; - block->bb_numrecs = cpu_to_be16(numrecs); - xfs_inobt_log_keys(cur, bp, ptr, numrecs); - xfs_inobt_log_ptrs(cur, bp, ptr, numrecs); - } else { - /* - * It's a leaf entry. Make a hole for the new record. - */ - rp = XFS_INOBT_REC_ADDR(block, 1, cur); - memmove(&rp[ptr], &rp[ptr - 1], - (numrecs - ptr + 1) * sizeof(*rp)); - /* - * Now stuff the new record in, bump numrecs - * and log the new data. - */ - rp[ptr - 1] = *recp; - numrecs++; - block->bb_numrecs = cpu_to_be16(numrecs); - xfs_inobt_log_recs(cur, bp, ptr, numrecs); - } - /* - * Log the new number of records in the btree header. - */ - xfs_inobt_log_block(cur->bc_tp, bp, XFS_BB_NUMRECS); -#ifdef DEBUG - /* - * Check that the key/record is in the right place, now. - */ - if (ptr < numrecs) { - if (level == 0) - xfs_btree_check_rec(cur->bc_btnum, rp + ptr - 1, - rp + ptr); - else - xfs_btree_check_key(cur->bc_btnum, kp + ptr - 1, - kp + ptr); - } -#endif - /* - * If we inserted at the start of a block, update the parents' keys. - */ - if (optr == 1 && (error = xfs_inobt_updkey(cur, &key, level + 1))) - return error; - /* - * Return the new block number, if any. - * If there is one, give back a record value and a cursor too. - */ - *bnop = nbno; - if (nbno != NULLAGBLOCK) { - *recp = nrec; - *curp = ncur; - } + ASSERT(args.len == 1); + XFS_BTREE_TRACE_CURSOR(cur, EXIT); + + new->u.inobt = cpu_to_be32(XFS_FSB_TO_AGBNO(args.mp, args.fsbno)); *stat = 1; return 0; } +STATIC int +xfs_inobt_free_block( + xfs_btree_cur_t *cur, + xfs_buf_t *bp, + int size) +{ + int error; + + error = xfs_free_extent(cur->bc_tp, + XFS_DADDR_TO_FSB(cur->bc_mp, XFS_BUF_ADDR(bp)), 1); + if (error) + return error; + xfs_trans_binval(cur->bc_tp, bp); + return 0; +} + /* - * Log header fields from a btree block. + * Log fields from the btree block header. */ STATIC void xfs_inobt_log_block( - xfs_trans_t *tp, /* transaction pointer */ + xfs_btree_cur_t *cur, /* btree cursor */ xfs_buf_t *bp, /* buffer containing btree block */ int fields) /* mask of fields: XFS_BB_... */ { @@ -758,1218 +179,514 @@ xfs_inobt_log_block( sizeof(xfs_inobt_block_t) }; + XFS_BTREE_TRACE_CURSOR(cur, ENTRY); + XFS_BTREE_TRACE_ARGBI(cur, bp, fields); xfs_btree_offsets(fields, offsets, XFS_BB_NUM_BITS, &first, &last); - xfs_trans_log_buf(tp, bp, first, last); + xfs_trans_log_buf(cur->bc_tp, bp, first, last); + XFS_BTREE_TRACE_CURSOR(cur, EXIT); } -/* - * Log keys from a btree block (nonleaf). - */ -STATIC void -xfs_inobt_log_keys( - xfs_btree_cur_t *cur, /* btree cursor */ - xfs_buf_t *bp, /* buffer containing btree block */ - int kfirst, /* index of first key to log */ - int klast) /* index of last key to log */ +static const struct xfs_btree_block_ops xfs_inobt_blkops = { + .get_buf = xfs_inobt_get_buf, + .read_buf = xfs_inobt_read_buf, + .get_block = xfs_inobt_get_block, + .buf_to_block = xfs_inobt_buf_to_block, + .buf_to_ptr = xfs_inobt_buf_to_ptr, + .log_block = xfs_inobt_log_block, + .check_block = xfs_btree_check_sblock, + + .alloc_block = xfs_inobt_alloc_block, + .free_block = xfs_inobt_free_block, + + .get_sibling = xfs_btree_get_ssibling, + .set_sibling = xfs_btree_set_ssibling, + .init_sibling = xfs_btree_init_sibling, +}; + +STATIC int +xfs_inobt_get_minrecs( + xfs_btree_cur_t *cur, + int lev) { - xfs_inobt_block_t *block; /* btree block to log from */ - int first; /* first byte offset logged */ - xfs_inobt_key_t *kp; /* key pointer in btree block */ - int last; /* last byte offset logged */ + return cur->bc_mp->m_inobt_mnr[lev != 0]; +} - block = XFS_BUF_TO_INOBT_BLOCK(bp); - kp = XFS_INOBT_KEY_ADDR(block, 1, cur); - first = (int)((xfs_caddr_t)&kp[kfirst - 1] - (xfs_caddr_t)block); - last = (int)(((xfs_caddr_t)&kp[klast] - 1) - (xfs_caddr_t)block); - xfs_trans_log_buf(cur->bc_tp, bp, first, last); +STATIC int +xfs_inobt_get_maxrecs( + xfs_btree_cur_t *cur, + int lev) +{ + return cur->bc_mp->m_inobt_mxr[lev != 0]; +} + +STATIC int +xfs_btree_get_numrecs( + xfs_btree_cur_t *cur, + xfs_btree_block_t *block) +{ + return be16_to_cpu(block->bb_h.bb_numrecs); } -/* - * Log block pointer fields from a btree block (nonleaf). - */ STATIC void -xfs_inobt_log_ptrs( - xfs_btree_cur_t *cur, /* btree cursor */ - xfs_buf_t *bp, /* buffer containing btree block */ - int pfirst, /* index of first pointer to log */ - int plast) /* index of last pointer to log */ +xfs_btree_set_numrecs( + xfs_btree_cur_t *cur, + xfs_btree_block_t *block, + int numrecs) { - xfs_inobt_block_t *block; /* btree block to log from */ - int first; /* first byte offset logged */ - int last; /* last byte offset logged */ - xfs_inobt_ptr_t *pp; /* block-pointer pointer in btree blk */ + block->bb_h.bb_numrecs = cpu_to_be16(numrecs); +} - block = XFS_BUF_TO_INOBT_BLOCK(bp); - pp = XFS_INOBT_PTR_ADDR(block, 1, cur); - first = (int)((xfs_caddr_t)&pp[pfirst - 1] - (xfs_caddr_t)block); - last = (int)(((xfs_caddr_t)&pp[plast] - 1) - (xfs_caddr_t)block); - xfs_trans_log_buf(cur->bc_tp, bp, first, last); +STATIC void +xfs_inobt_init_key_from_rec( + xfs_btree_cur_t *cur, + xfs_btree_key_t *key, + xfs_btree_rec_t *rec) +{ + key->u.inobt.ir_startino = rec->u.inobt.ir_startino; } /* - * Log records from a btree block (leaf). + * intial value of ptr for lookup */ STATIC void -xfs_inobt_log_recs( - xfs_btree_cur_t *cur, /* btree cursor */ - xfs_buf_t *bp, /* buffer containing btree block */ - int rfirst, /* index of first record to log */ - int rlast) /* index of last record to log */ +xfs_inobt_init_ptr_from_cur( + xfs_btree_cur_t *cur, + xfs_btree_ptr_t *ptr) { - xfs_inobt_block_t *block; /* btree block to log from */ - int first; /* first byte offset logged */ - int last; /* last byte offset logged */ - xfs_inobt_rec_t *rp; /* record pointer for btree block */ + xfs_agi_t *agi; /* a.g. inode header */ - block = XFS_BUF_TO_INOBT_BLOCK(bp); - rp = XFS_INOBT_REC_ADDR(block, 1, cur); - first = (int)((xfs_caddr_t)&rp[rfirst - 1] - (xfs_caddr_t)block); - last = (int)(((xfs_caddr_t)&rp[rlast] - 1) - (xfs_caddr_t)block); - xfs_trans_log_buf(cur->bc_tp, bp, first, last); + agi = XFS_BUF_TO_AGI(cur->bc_private.i.agbp); + ASSERT(cur->bc_private.i.agno == be32_to_cpu(agi->agi_seqno)); + + ptr->u.inobt = agi->agi_root; } -/* - * Lookup the record. The cursor is made to point to it, based on dir. - * Return 0 if can't find any such record, 1 for success. - */ -STATIC int /* error */ -xfs_inobt_lookup( - xfs_btree_cur_t *cur, /* btree cursor */ - xfs_lookup_t dir, /* <=, ==, or >= */ - int *stat) /* success/failure */ +STATIC void +xfs_inobt_init_rec_from_key( + xfs_btree_cur_t *cur, + xfs_btree_key_t *key, + xfs_btree_rec_t *rec) { - xfs_agblock_t agbno; /* a.g. relative btree block number */ - xfs_agnumber_t agno; /* allocation group number */ - xfs_inobt_block_t *block=NULL; /* current btree block */ - __int64_t diff; /* difference for the current key */ - int error; /* error return value */ - int keyno=0; /* current key number */ - int level; /* level in the btree */ - xfs_mount_t *mp; /* file system mount point */ - - /* - * Get the allocation group header, and the root block number. - */ - mp = cur->bc_mp; - { - xfs_agi_t *agi; /* a.g. inode header */ - - agi = XFS_BUF_TO_AGI(cur->bc_private.i.agbp); - agno = be32_to_cpu(agi->agi_seqno); - agbno = be32_to_cpu(agi->agi_root); - } - /* - * Iterate over each level in the btree, starting at the root. - * For each level above the leaves, find the key we need, based - * on the lookup record, then follow the corresponding block - * pointer down to the next level. - */ - for (level = cur->bc_nlevels - 1, diff = 1; level >= 0; level--) { - xfs_buf_t *bp; /* buffer pointer for btree block */ - xfs_daddr_t d; /* disk address of btree block */ - - /* - * Get the disk address we're looking for. - */ - d = XFS_AGB_TO_DADDR(mp, agno, agbno); - /* - * If the old buffer at this level is for a different block, - * throw it away, otherwise just use it. - */ - bp = cur->bc_bufs[level]; - if (bp && XFS_BUF_ADDR(bp) != d) - bp = NULL; - if (!bp) { - /* - * Need to get a new buffer. Read it, then - * set it in the cursor, releasing the old one. - */ - if ((error = xfs_btree_read_bufs(mp, cur->bc_tp, - agno, agbno, 0, &bp, XFS_INO_BTREE_REF))) - return error; - xfs_btree_setbuf(cur, level, bp); - /* - * Point to the btree block, now that we have the buffer - */ - block = XFS_BUF_TO_INOBT_BLOCK(bp); - if ((error = xfs_btree_check_sblock(cur, block, level, - bp))) - return error; - } else - block = XFS_BUF_TO_INOBT_BLOCK(bp); - /* - * If we already had a key match at a higher level, we know - * we need to use the first entry in this block. - */ - if (diff == 0) - keyno = 1; - /* - * Otherwise we need to search this block. Do a binary search. - */ - else { - int high; /* high entry number */ - xfs_inobt_key_t *kkbase=NULL;/* base of keys in block */ - xfs_inobt_rec_t *krbase=NULL;/* base of records in block */ - int low; /* low entry number */ - - /* - * Get a pointer to keys or records. - */ - if (level > 0) - kkbase = XFS_INOBT_KEY_ADDR(block, 1, cur); - else - krbase = XFS_INOBT_REC_ADDR(block, 1, cur); - /* - * Set low and high entry numbers, 1-based. - */ - low = 1; - if (!(high = be16_to_cpu(block->bb_numrecs))) { - /* - * If the block is empty, the tree must - * be an empty leaf. - */ - ASSERT(level == 0 && cur->bc_nlevels == 1); - cur->bc_ptrs[0] = dir != XFS_LOOKUP_LE; - *stat = 0; - return 0; - } - /* - * Binary search the block. - */ - while (low <= high) { - xfs_agino_t startino; /* key value */ - - /* - * keyno is average of low and high. - */ - keyno = (low + high) >> 1; - /* - * Get startino. - */ - if (level > 0) { - xfs_inobt_key_t *kkp; - - kkp = kkbase + keyno - 1; - startino = be32_to_cpu(kkp->ir_startino); - } else { - xfs_inobt_rec_t *krp; - - krp = krbase + keyno - 1; - startino = be32_to_cpu(krp->ir_startino); - } - /* - * Compute difference to get next direction. - */ - diff = (__int64_t) - startino - cur->bc_rec.i.ir_startino; - /* - * Less than, move right. - */ - if (diff < 0) - low = keyno + 1; - /* - * Greater than, move left. - */ - else if (diff > 0) - high = keyno - 1; - /* - * Equal, we're done. - */ - else - break; - } - } - /* - * If there are more levels, set up for the next level - * by getting the block number and filling in the cursor. - */ - if (level > 0) { - /* - * If we moved left, need the previous key number, - * unless there isn't one. - */ - if (diff > 0 && --keyno < 1) - keyno = 1; - agbno = be32_to_cpu(*XFS_INOBT_PTR_ADDR(block, keyno, cur)); -#ifdef DEBUG - if ((error = xfs_btree_check_sptr(cur, agbno, level))) - return error; -#endif - cur->bc_ptrs[level] = keyno; - } - } - /* - * Done with the search. - * See if we need to adjust the results. - */ - if (dir != XFS_LOOKUP_LE && diff < 0) { - keyno++; - /* - * If ge search and we went off the end of the block, but it's - * not the last block, we're in the wrong block. - */ - if (dir == XFS_LOOKUP_GE && - keyno > be16_to_cpu(block->bb_numrecs) && - be32_to_cpu(block->bb_rightsib) != NULLAGBLOCK) { - int i; - - cur->bc_ptrs[0] = keyno; - if ((error = xfs_inobt_increment(cur, 0, &i))) - return error; - ASSERT(i == 1); - *stat = 1; - return 0; - } - } - else if (dir == XFS_LOOKUP_LE && diff > 0) - keyno--; - cur->bc_ptrs[0] = keyno; - /* - * Return if we succeeded or not. - */ - if (keyno == 0 || keyno > be16_to_cpu(block->bb_numrecs)) - *stat = 0; - else - *stat = ((dir != XFS_LOOKUP_EQ) || (diff == 0)); - return 0; + rec->u.inobt.ir_startino = key->u.inobt.ir_startino; } -/* - * Move 1 record left from cur/level if possible. - * Update cur to reflect the new path. - */ -STATIC int /* error */ -xfs_inobt_lshift( - xfs_btree_cur_t *cur, /* btree cursor */ - int level, /* level to shift record on */ - int *stat) /* success/failure */ +STATIC void +xfs_inobt_init_rec_from_cur( + xfs_btree_cur_t *cur, + xfs_btree_rec_t *rec) { - int error; /* error return value */ -#ifdef DEBUG - int i; /* loop index */ -#endif - xfs_inobt_key_t key; /* key value for leaf level upward */ - xfs_buf_t *lbp; /* buffer for left neighbor block */ - xfs_inobt_block_t *left; /* left neighbor btree block */ - xfs_inobt_key_t *lkp=NULL; /* key pointer for left block */ - xfs_inobt_ptr_t *lpp; /* address pointer for left block */ - xfs_inobt_rec_t *lrp=NULL; /* record pointer for left block */ - int nrec; /* new number of left block entries */ - xfs_buf_t *rbp; /* buffer for right (current) block */ - xfs_inobt_block_t *right; /* right (current) btree block */ - xfs_inobt_key_t *rkp=NULL; /* key pointer for right block */ - xfs_inobt_ptr_t *rpp=NULL; /* address pointer for right block */ - xfs_inobt_rec_t *rrp=NULL; /* record pointer for right block */ + rec->u.inobt.ir_startino = cpu_to_be32(cur->bc_rec.i.ir_startino); + rec->u.inobt.ir_freecount = cpu_to_be32(cur->bc_rec.i.ir_freecount); + rec->u.inobt.ir_free = cpu_to_be64(cur->bc_rec.i.ir_free); +} - /* - * Set up variables for this block as "right". - */ - rbp = cur->bc_bufs[level]; - right = XFS_BUF_TO_INOBT_BLOCK(rbp); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, right, level, rbp))) - return error; -#endif - /* - * If we've got no left sibling then we can't shift an entry left. - */ - if (be32_to_cpu(right->bb_leftsib) == NULLAGBLOCK) { - *stat = 0; - return 0; - } - /* - * If the cursor entry is the one that would be moved, don't - * do it... it's too complicated. - */ - if (cur->bc_ptrs[level] <= 1) { - *stat = 0; - return 0; - } - /* - * Set up the left neighbor as "left". - */ - if ((error = xfs_btree_read_bufs(cur->bc_mp, cur->bc_tp, - cur->bc_private.i.agno, be32_to_cpu(right->bb_leftsib), - 0, &lbp, XFS_INO_BTREE_REF))) - return error; - left = XFS_BUF_TO_INOBT_BLOCK(lbp); - if ((error = xfs_btree_check_sblock(cur, left, level, lbp))) - return error; - /* - * If it's full, it can't take another entry. - */ - if (be16_to_cpu(left->bb_numrecs) == XFS_INOBT_BLOCK_MAXRECS(level, cur)) { - *stat = 0; - return 0; - } - nrec = be16_to_cpu(left->bb_numrecs) + 1; - /* - * If non-leaf, copy a key and a ptr to the left block. - */ - if (level > 0) { - lkp = XFS_INOBT_KEY_ADDR(left, nrec, cur); - rkp = XFS_INOBT_KEY_ADDR(right, 1, cur); - *lkp = *rkp; - xfs_inobt_log_keys(cur, lbp, nrec, nrec); - lpp = XFS_INOBT_PTR_ADDR(left, nrec, cur); - rpp = XFS_INOBT_PTR_ADDR(right, 1, cur); -#ifdef DEBUG - if ((error = xfs_btree_check_sptr(cur, be32_to_cpu(*rpp), level))) - return error; -#endif - *lpp = *rpp; - xfs_inobt_log_ptrs(cur, lbp, nrec, nrec); - } - /* - * If leaf, copy a record to the left block. - */ - else { - lrp = XFS_INOBT_REC_ADDR(left, nrec, cur); - rrp = XFS_INOBT_REC_ADDR(right, 1, cur); - *lrp = *rrp; - xfs_inobt_log_recs(cur, lbp, nrec, nrec); - } - /* - * Bump and log left's numrecs, decrement and log right's numrecs. - */ - be16_add(&left->bb_numrecs, 1); - xfs_inobt_log_block(cur->bc_tp, lbp, XFS_BB_NUMRECS); -#ifdef DEBUG - if (level > 0) - xfs_btree_check_key(cur->bc_btnum, lkp - 1, lkp); - else - xfs_btree_check_rec(cur->bc_btnum, lrp - 1, lrp); -#endif - be16_add(&right->bb_numrecs, -1); - xfs_inobt_log_block(cur->bc_tp, rbp, XFS_BB_NUMRECS); - /* - * Slide the contents of right down one entry. - */ - if (level > 0) { -#ifdef DEBUG - for (i = 0; i < be16_to_cpu(right->bb_numrecs); i++) { - if ((error = xfs_btree_check_sptr(cur, be32_to_cpu(rpp[i + 1]), - level))) - return error; - } -#endif - memmove(rkp, rkp + 1, be16_to_cpu(right->bb_numrecs) * sizeof(*rkp)); - memmove(rpp, rpp + 1, be16_to_cpu(right->bb_numrecs) * sizeof(*rpp)); - xfs_inobt_log_keys(cur, rbp, 1, be16_to_cpu(right->bb_numrecs)); - xfs_inobt_log_ptrs(cur, rbp, 1, be16_to_cpu(right->bb_numrecs)); - } else { - memmove(rrp, rrp + 1, be16_to_cpu(right->bb_numrecs) * sizeof(*rrp)); - xfs_inobt_log_recs(cur, rbp, 1, be16_to_cpu(right->bb_numrecs)); - key.ir_startino = rrp->ir_startino; - rkp = &key; - } - /* - * Update the parent key values of right. - */ - if ((error = xfs_inobt_updkey(cur, rkp, level + 1))) - return error; - /* - * Slide the cursor value left one. - */ - cur->bc_ptrs[level]--; - *stat = 1; - return 0; +STATIC xfs_btree_key_t * +xfs_inobt_key_addr( + xfs_btree_cur_t *cur, + int index, + xfs_btree_block_t *block) +{ + return (xfs_btree_key_t *)XFS_INOBT_KEY_ADDR(&block->bb_h, index, cur); } -/* - * Allocate a new root block, fill it in. - */ -STATIC int /* error */ -xfs_inobt_newroot( - xfs_btree_cur_t *cur, /* btree cursor */ - int *stat) /* success/failure */ +STATIC xfs_btree_ptr_t * +xfs_inobt_ptr_addr( + xfs_btree_cur_t *cur, + int index, + xfs_btree_block_t *block) { - xfs_agi_t *agi; /* a.g. inode header */ - xfs_alloc_arg_t args; /* allocation argument structure */ - xfs_inobt_block_t *block; /* one half of the old root block */ - xfs_buf_t *bp; /* buffer containing block */ - int error; /* error return value */ - xfs_inobt_key_t *kp; /* btree key pointer */ - xfs_agblock_t lbno; /* left block number */ - xfs_buf_t *lbp; /* left buffer pointer */ - xfs_inobt_block_t *left; /* left btree block */ - xfs_buf_t *nbp; /* new (root) buffer */ - xfs_inobt_block_t *new; /* new (root) btree block */ - int nptr; /* new value for key index, 1 or 2 */ - xfs_inobt_ptr_t *pp; /* btree address pointer */ - xfs_agblock_t rbno; /* right block number */ - xfs_buf_t *rbp; /* right buffer pointer */ - xfs_inobt_block_t *right; /* right btree block */ - xfs_inobt_rec_t *rp; /* btree record pointer */ + return (xfs_btree_ptr_t *)XFS_INOBT_PTR_ADDR(&block->bb_h, index, cur); +} - ASSERT(cur->bc_nlevels < XFS_IN_MAXLEVELS(cur->bc_mp)); +STATIC xfs_btree_rec_t * +xfs_inobt_rec_addr( + xfs_btree_cur_t *cur, + int index, + xfs_btree_block_t *block) +{ + return (xfs_btree_rec_t *)XFS_INOBT_REC_ADDR(&block->bb_h, index, cur); +} - /* - * Get a block & a buffer. - */ - agi = XFS_BUF_TO_AGI(cur->bc_private.i.agbp); - args.tp = cur->bc_tp; - args.mp = cur->bc_mp; - args.fsbno = XFS_AGB_TO_FSB(args.mp, cur->bc_private.i.agno, - be32_to_cpu(agi->agi_root)); - args.mod = args.minleft = args.alignment = args.total = args.wasdel = - args.isfl = args.userdata = args.minalignslop = 0; - args.minlen = args.maxlen = args.prod = 1; - args.type = XFS_ALLOCTYPE_NEAR_BNO; - if ((error = xfs_alloc_vextent(&args))) - return error; - /* - * None available, we fail. - */ - if (args.fsbno == NULLFSBLOCK) { - *stat = 0; - return 0; - } - ASSERT(args.len == 1); - nbp = xfs_btree_get_bufs(args.mp, args.tp, args.agno, args.agbno, 0); - new = XFS_BUF_TO_INOBT_BLOCK(nbp); - /* - * Set the root data in the a.g. inode structure. - */ - agi->agi_root = cpu_to_be32(args.agbno); - be32_add(&agi->agi_level, 1); - xfs_ialloc_log_agi(args.tp, cur->bc_private.i.agbp, - XFS_AGI_ROOT | XFS_AGI_LEVEL); - /* - * At the previous root level there are now two blocks: the old - * root, and the new block generated when it was split. - * We don't know which one the cursor is pointing at, so we - * set up variables "left" and "right" for each case. - */ - bp = cur->bc_bufs[cur->bc_nlevels - 1]; - block = XFS_BUF_TO_INOBT_BLOCK(bp); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, block, cur->bc_nlevels - 1, bp))) - return error; -#endif - if (be32_to_cpu(block->bb_rightsib) != NULLAGBLOCK) { - /* - * Our block is left, pick up the right block. - */ - lbp = bp; - lbno = XFS_DADDR_TO_AGBNO(args.mp, XFS_BUF_ADDR(lbp)); - left = block; - rbno = be32_to_cpu(left->bb_rightsib); - if ((error = xfs_btree_read_bufs(args.mp, args.tp, args.agno, - rbno, 0, &rbp, XFS_INO_BTREE_REF))) - return error; - bp = rbp; - right = XFS_BUF_TO_INOBT_BLOCK(rbp); - if ((error = xfs_btree_check_sblock(cur, right, - cur->bc_nlevels - 1, rbp))) - return error; - nptr = 1; - } else { - /* - * Our block is right, pick up the left block. - */ - rbp = bp; - rbno = XFS_DADDR_TO_AGBNO(args.mp, XFS_BUF_ADDR(rbp)); - right = block; - lbno = be32_to_cpu(right->bb_leftsib); - if ((error = xfs_btree_read_bufs(args.mp, args.tp, args.agno, - lbno, 0, &lbp, XFS_INO_BTREE_REF))) - return error; - bp = lbp; - left = XFS_BUF_TO_INOBT_BLOCK(lbp); - if ((error = xfs_btree_check_sblock(cur, left, - cur->bc_nlevels - 1, lbp))) - return error; - nptr = 2; - } - /* - * Fill in the new block's btree header and log it. - */ - new->bb_magic = cpu_to_be32(xfs_magics[cur->bc_btnum]); - new->bb_level = cpu_to_be16(cur->bc_nlevels); - new->bb_numrecs = cpu_to_be16(2); - new->bb_leftsib = cpu_to_be32(NULLAGBLOCK); - new->bb_rightsib = cpu_to_be32(NULLAGBLOCK); - xfs_inobt_log_block(args.tp, nbp, XFS_BB_ALL_BITS); - ASSERT(lbno != NULLAGBLOCK && rbno != NULLAGBLOCK); - /* - * Fill in the key data in the new root. - */ - kp = XFS_INOBT_KEY_ADDR(new, 1, cur); - if (be16_to_cpu(left->bb_level) > 0) { - kp[0] = *XFS_INOBT_KEY_ADDR(left, 1, cur); - kp[1] = *XFS_INOBT_KEY_ADDR(right, 1, cur); - } else { - rp = XFS_INOBT_REC_ADDR(left, 1, cur); - kp[0].ir_startino = rp->ir_startino; - rp = XFS_INOBT_REC_ADDR(right, 1, cur); - kp[1].ir_startino = rp->ir_startino; - } - xfs_inobt_log_keys(cur, nbp, 1, 2); - /* - * Fill in the pointer data in the new root. - */ - pp = XFS_INOBT_PTR_ADDR(new, 1, cur); - pp[0] = cpu_to_be32(lbno); - pp[1] = cpu_to_be32(rbno); - xfs_inobt_log_ptrs(cur, nbp, 1, 2); - /* - * Fix up the cursor. - */ - xfs_btree_setbuf(cur, cur->bc_nlevels, nbp); - cur->bc_ptrs[cur->bc_nlevels] = nptr; - cur->bc_nlevels++; - *stat = 1; - return 0; +STATIC int64_t +xfs_inobt_key_diff( + xfs_btree_cur_t *cur, + xfs_btree_key_t *key) +{ + return (int64_t)(be32_to_cpu(key->u.inobt.ir_startino)) - + cur->bc_rec.i.ir_startino; } -/* - * Move 1 record right from cur/level if possible. - * Update cur to reflect the new path. - */ -STATIC int /* error */ -xfs_inobt_rshift( - xfs_btree_cur_t *cur, /* btree cursor */ - int level, /* level to shift record on */ - int *stat) /* success/failure */ +STATIC xfs_daddr_t +xfs_inobt_ptr_to_daddr( + xfs_btree_cur_t *cur, + xfs_btree_ptr_t *ptr) { - int error; /* error return value */ - int i; /* loop index */ - xfs_inobt_key_t key; /* key value for leaf level upward */ - xfs_buf_t *lbp; /* buffer for left (current) block */ - xfs_inobt_block_t *left; /* left (current) btree block */ - xfs_inobt_key_t *lkp; /* key pointer for left block */ - xfs_inobt_ptr_t *lpp; /* address pointer for left block */ - xfs_inobt_rec_t *lrp; /* record pointer for left block */ - xfs_buf_t *rbp; /* buffer for right neighbor block */ - xfs_inobt_block_t *right; /* right neighbor btree block */ - xfs_inobt_key_t *rkp; /* key pointer for right block */ - xfs_inobt_ptr_t *rpp; /* address pointer for right block */ - xfs_inobt_rec_t *rrp=NULL; /* record pointer for right block */ - xfs_btree_cur_t *tcur; /* temporary cursor */ + return XFS_AGB_TO_DADDR(cur->bc_mp, cur->bc_private.i.agno, + be32_to_cpu(ptr->u.inobt)); +} - /* - * Set up variables for this block as "left". - */ - lbp = cur->bc_bufs[level]; - left = XFS_BUF_TO_INOBT_BLOCK(lbp); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, left, level, lbp))) - return error; -#endif - /* - * If we've got no right sibling then we can't shift an entry right. - */ - if (be32_to_cpu(left->bb_rightsib) == NULLAGBLOCK) { - *stat = 0; - return 0; - } - /* - * If the cursor entry is the one that would be moved, don't - * do it... it's too complicated. - */ - if (cur->bc_ptrs[level] >= be16_to_cpu(left->bb_numrecs)) { - *stat = 0; - return 0; - } - /* - * Set up the right neighbor as "right". - */ - if ((error = xfs_btree_read_bufs(cur->bc_mp, cur->bc_tp, - cur->bc_private.i.agno, be32_to_cpu(left->bb_rightsib), - 0, &rbp, XFS_INO_BTREE_REF))) - return error; - right = XFS_BUF_TO_INOBT_BLOCK(rbp); - if ((error = xfs_btree_check_sblock(cur, right, level, rbp))) - return error; - /* - * If it's full, it can't take another entry. - */ - if (be16_to_cpu(right->bb_numrecs) == XFS_INOBT_BLOCK_MAXRECS(level, cur)) { - *stat = 0; - return 0; - } - /* - * Make a hole at the start of the right neighbor block, then - * copy the last left block entry to the hole. - */ - if (level > 0) { - lkp = XFS_INOBT_KEY_ADDR(left, be16_to_cpu(left->bb_numrecs), cur); - lpp = XFS_INOBT_PTR_ADDR(left, be16_to_cpu(left->bb_numrecs), cur); - rkp = XFS_INOBT_KEY_ADDR(right, 1, cur); - rpp = XFS_INOBT_PTR_ADDR(right, 1, cur); -#ifdef DEBUG - for (i = be16_to_cpu(right->bb_numrecs) - 1; i >= 0; i--) { - if ((error = xfs_btree_check_sptr(cur, be32_to_cpu(rpp[i]), level))) - return error; - } -#endif - memmove(rkp + 1, rkp, be16_to_cpu(right->bb_numrecs) * sizeof(*rkp)); - memmove(rpp + 1, rpp, be16_to_cpu(right->bb_numrecs) * sizeof(*rpp)); -#ifdef DEBUG - if ((error = xfs_btree_check_sptr(cur, be32_to_cpu(*lpp), level))) - return error; -#endif - *rkp = *lkp; - *rpp = *lpp; - xfs_inobt_log_keys(cur, rbp, 1, be16_to_cpu(right->bb_numrecs) + 1); - xfs_inobt_log_ptrs(cur, rbp, 1, be16_to_cpu(right->bb_numrecs) + 1); +STATIC void +xfs_inobt_move_keys( + xfs_btree_cur_t *cur, + xfs_btree_key_t *src_key, + xfs_btree_key_t *dst_key, + int from, + int to, + int numkeys) +{ + if (dst_key == NULL) { + /* moving within a block */ + xfs_inobt_key_t *kp = &src_key->u.inobt; + memmove(&kp[to], &kp[from], numkeys * sizeof(*kp)); } else { - lrp = XFS_INOBT_REC_ADDR(left, be16_to_cpu(left->bb_numrecs), cur); - rrp = XFS_INOBT_REC_ADDR(right, 1, cur); - memmove(rrp + 1, rrp, be16_to_cpu(right->bb_numrecs) * sizeof(*rrp)); - *rrp = *lrp; - xfs_inobt_log_recs(cur, rbp, 1, be16_to_cpu(right->bb_numrecs) + 1); - key.ir_startino = rrp->ir_startino; - rkp = &key; + /* moving between blocks */ + memcpy(dst_key, src_key, numkeys * sizeof(xfs_inobt_key_t)); } - /* - * Decrement and log left's numrecs, bump and log right's numrecs. - */ - be16_add(&left->bb_numrecs, -1); - xfs_inobt_log_block(cur->bc_tp, lbp, XFS_BB_NUMRECS); - be16_add(&right->bb_numrecs, 1); -#ifdef DEBUG - if (level > 0) - xfs_btree_check_key(cur->bc_btnum, rkp, rkp + 1); - else - xfs_btree_check_rec(cur->bc_btnum, rrp, rrp + 1); -#endif - xfs_inobt_log_block(cur->bc_tp, rbp, XFS_BB_NUMRECS); - /* - * Using a temporary cursor, update the parent key values of the - * block on the right. - */ - if ((error = xfs_btree_dup_cursor(cur, &tcur))) - return error; - xfs_btree_lastrec(tcur, level); - if ((error = xfs_inobt_increment(tcur, level, &i)) || - (error = xfs_inobt_updkey(tcur, rkp, level + 1))) { - xfs_btree_del_cursor(tcur, XFS_BTREE_ERROR); - return error; - } - xfs_btree_del_cursor(tcur, XFS_BTREE_NOERROR); - *stat = 1; - return 0; } -/* - * Split cur/level block in half. - * Return new block number and its first record (to be inserted into parent). - */ -STATIC int /* error */ -xfs_inobt_split( - xfs_btree_cur_t *cur, /* btree cursor */ - int level, /* level to split */ - xfs_agblock_t *bnop, /* output: block number allocated */ - xfs_inobt_key_t *keyp, /* output: first key of new block */ - xfs_btree_cur_t **curp, /* output: new cursor */ - int *stat) /* success/failure */ -{ - xfs_alloc_arg_t args; /* allocation argument structure */ - int error; /* error return value */ - int i; /* loop index/record number */ - xfs_agblock_t lbno; /* left (current) block number */ - xfs_buf_t *lbp; /* buffer for left block */ - xfs_inobt_block_t *left; /* left (current) btree block */ - xfs_inobt_key_t *lkp; /* left btree key pointer */ - xfs_inobt_ptr_t *lpp; /* left btree address pointer */ - xfs_inobt_rec_t *lrp; /* left btree record pointer */ - xfs_buf_t *rbp; /* buffer for right block */ - xfs_inobt_block_t *right; /* right (new) btree block */ - xfs_inobt_key_t *rkp; /* right btree key pointer */ - xfs_inobt_ptr_t *rpp; /* right btree address pointer */ - xfs_inobt_rec_t *rrp; /* right btree record pointer */ - - /* - * Set up left block (current one). - */ - lbp = cur->bc_bufs[level]; - args.tp = cur->bc_tp; - args.mp = cur->bc_mp; - lbno = XFS_DADDR_TO_AGBNO(args.mp, XFS_BUF_ADDR(lbp)); - /* - * Allocate the new block. - * If we can't do it, we're toast. Give up. - */ - args.fsbno = XFS_AGB_TO_FSB(args.mp, cur->bc_private.i.agno, lbno); - args.mod = args.minleft = args.alignment = args.total = args.wasdel = - args.isfl = args.userdata = args.minalignslop = 0; - args.minlen = args.maxlen = args.prod = 1; - args.type = XFS_ALLOCTYPE_NEAR_BNO; - if ((error = xfs_alloc_vextent(&args))) - return error; - if (args.fsbno == NULLFSBLOCK) { - *stat = 0; - return 0; - } - ASSERT(args.len == 1); - rbp = xfs_btree_get_bufs(args.mp, args.tp, args.agno, args.agbno, 0); - /* - * Set up the new block as "right". - */ - right = XFS_BUF_TO_INOBT_BLOCK(rbp); - /* - * "Left" is the current (according to the cursor) block. - */ - left = XFS_BUF_TO_INOBT_BLOCK(lbp); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, left, level, lbp))) - return error; -#endif - /* - * Fill in the btree header for the new block. - */ - right->bb_magic = cpu_to_be32(xfs_magics[cur->bc_btnum]); - right->bb_level = left->bb_level; - right->bb_numrecs = cpu_to_be16(be16_to_cpu(left->bb_numrecs) / 2); - /* - * Make sure that if there's an odd number of entries now, that - * each new block will have the same number of entries. - */ - if ((be16_to_cpu(left->bb_numrecs) & 1) && - cur->bc_ptrs[level] <= be16_to_cpu(right->bb_numrecs) + 1) - be16_add(&right->bb_numrecs, 1); - i = be16_to_cpu(left->bb_numrecs) - be16_to_cpu(right->bb_numrecs) + 1; - /* - * For non-leaf blocks, copy keys and addresses over to the new block. - */ - if (level > 0) { - lkp = XFS_INOBT_KEY_ADDR(left, i, cur); - lpp = XFS_INOBT_PTR_ADDR(left, i, cur); - rkp = XFS_INOBT_KEY_ADDR(right, 1, cur); - rpp = XFS_INOBT_PTR_ADDR(right, 1, cur); -#ifdef DEBUG - for (i = 0; i < be16_to_cpu(right->bb_numrecs); i++) { - if ((error = xfs_btree_check_sptr(cur, be32_to_cpu(lpp[i]), level))) - return error; - } -#endif - memcpy(rkp, lkp, be16_to_cpu(right->bb_numrecs) * sizeof(*rkp)); - memcpy(rpp, lpp, be16_to_cpu(right->bb_numrecs) * sizeof(*rpp)); - xfs_inobt_log_keys(cur, rbp, 1, be16_to_cpu(right->bb_numrecs)); - xfs_inobt_log_ptrs(cur, rbp, 1, be16_to_cpu(right->bb_numrecs)); - *keyp = *rkp; - } - /* - * For leaf blocks, copy records over to the new block. - */ - else { - lrp = XFS_INOBT_REC_ADDR(left, i, cur); - rrp = XFS_INOBT_REC_ADDR(right, 1, cur); - memcpy(rrp, lrp, be16_to_cpu(right->bb_numrecs) * sizeof(*rrp)); - xfs_inobt_log_recs(cur, rbp, 1, be16_to_cpu(right->bb_numrecs)); - keyp->ir_startino = rrp->ir_startino; - } - /* - * Find the left block number by looking in the buffer. - * Adjust numrecs, sibling pointers. - */ - be16_add(&left->bb_numrecs, -(be16_to_cpu(right->bb_numrecs))); - right->bb_rightsib = left->bb_rightsib; - left->bb_rightsib = cpu_to_be32(args.agbno); - right->bb_leftsib = cpu_to_be32(lbno); - xfs_inobt_log_block(args.tp, rbp, XFS_BB_ALL_BITS); - xfs_inobt_log_block(args.tp, lbp, XFS_BB_NUMRECS | XFS_BB_RIGHTSIB); - /* - * If there's a block to the new block's right, make that block - * point back to right instead of to left. - */ - if (be32_to_cpu(right->bb_rightsib) != NULLAGBLOCK) { - xfs_inobt_block_t *rrblock; /* rr btree block */ - xfs_buf_t *rrbp; /* buffer for rrblock */ - - if ((error = xfs_btree_read_bufs(args.mp, args.tp, args.agno, - be32_to_cpu(right->bb_rightsib), 0, &rrbp, - XFS_INO_BTREE_REF))) - return error; - rrblock = XFS_BUF_TO_INOBT_BLOCK(rrbp); - if ((error = xfs_btree_check_sblock(cur, rrblock, level, rrbp))) - return error; - rrblock->bb_leftsib = cpu_to_be32(args.agbno); - xfs_inobt_log_block(args.tp, rrbp, XFS_BB_LEFTSIB); - } - /* - * If the cursor is really in the right block, move it there. - * If it's just pointing past the last entry in left, then we'll - * insert there, so don't change anything in that case. - */ - if (cur->bc_ptrs[level] > be16_to_cpu(left->bb_numrecs) + 1) { - xfs_btree_setbuf(cur, level, rbp); - cur->bc_ptrs[level] -= be16_to_cpu(left->bb_numrecs); +STATIC void +xfs_inobt_move_ptrs( + xfs_btree_cur_t *cur, + xfs_btree_ptr_t *src_ptr, + xfs_btree_ptr_t *dst_ptr, + int from, + int to, + int numptrs) +{ + if (dst_ptr == NULL) { + /* moving within a block */ + xfs_inobt_ptr_t *pp = &src_ptr->u.inobt; + memmove(&pp[to], &pp[from], numptrs * sizeof(*pp)); + } else { + /* moving between blocks */ + memcpy(dst_ptr, src_ptr, numptrs * sizeof(xfs_inobt_ptr_t)); } - /* - * If there are more levels, we'll need another cursor which refers - * the right block, no matter where this cursor was. - */ - if (level + 1 < cur->bc_nlevels) { - if ((error = xfs_btree_dup_cursor(cur, curp))) - return error; - (*curp)->bc_ptrs[level + 1]++; +} + +STATIC void +xfs_inobt_move_recs( + xfs_btree_cur_t *cur, + xfs_btree_rec_t *src_rec, + xfs_btree_rec_t *dst_rec, + int from, + int to, + int numrecs) +{ + if (dst_rec == NULL) { + /* moving within a block */ + xfs_inobt_rec_t *rp = &src_rec->u.inobt; + memmove(&rp[to], &rp[from], numrecs * sizeof(*rp)); + } else { + /* moving between blocks */ + memcpy(dst_rec, src_rec, numrecs * sizeof(xfs_inobt_rec_t)); } - *bnop = args.agbno; - *stat = 1; - return 0; } -/* - * Update keys at all levels from here to the root along the cursor's path. - */ -STATIC int /* error */ -xfs_inobt_updkey( - xfs_btree_cur_t *cur, /* btree cursor */ - xfs_inobt_key_t *keyp, /* new key value to update to */ - int level) /* starting level for update */ + +STATIC void +xfs_inobt_set_key( + xfs_btree_cur_t *cur, + xfs_btree_key_t *key_addr, + int index, + xfs_btree_key_t *newkey) { - int ptr; /* index of key in block */ + xfs_inobt_key_t *kp = &key_addr->u.inobt; - /* - * Go up the tree from this level toward the root. - * At each level, update the key value to the value input. - * Stop when we reach a level where the cursor isn't pointing - * at the first entry in the block. - */ - for (ptr = 1; ptr == 1 && level < cur->bc_nlevels; level++) { - xfs_buf_t *bp; /* buffer for block */ - xfs_inobt_block_t *block; /* btree block */ -#ifdef DEBUG - int error; /* error return value */ -#endif - xfs_inobt_key_t *kp; /* ptr to btree block keys */ + kp[index] = newkey->u.inobt; +} - bp = cur->bc_bufs[level]; - block = XFS_BUF_TO_INOBT_BLOCK(bp); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, block, level, bp))) - return error; -#endif - ptr = cur->bc_ptrs[level]; - kp = XFS_INOBT_KEY_ADDR(block, ptr, cur); - *kp = *keyp; - xfs_inobt_log_keys(cur, bp, ptr, ptr); - } - return 0; +STATIC void +xfs_inobt_set_ptr( + xfs_btree_cur_t *cur, + xfs_btree_ptr_t *ptr_addr, + int index, + xfs_btree_ptr_t *newptr) +{ + xfs_inobt_ptr_t *pp = &ptr_addr->u.inobt; + + pp[index] = newptr->u.inobt; } -/* - * Externally visible routines. - */ +STATIC void +xfs_inobt_set_rec( + xfs_btree_cur_t *cur, + xfs_btree_rec_t *rec_addr, + int index, + xfs_btree_rec_t *newrec) +{ + xfs_inobt_rec_t *rp = &rec_addr->u.inobt; + + rp[index] = newrec->u.inobt; +} /* - * Decrement cursor by one record at the level. - * For nonzero levels the leaf-ward information is untouched. + * Log keys from a btree block (nonleaf). */ -int /* error */ -xfs_inobt_decrement( +STATIC void +xfs_inobt_log_keys( xfs_btree_cur_t *cur, /* btree cursor */ - int level, /* level in btree, 0 is leaf */ - int *stat) /* success/failure */ + xfs_buf_t *bp, /* buffer containing btree block */ + int kfirst, /* index of first key to log */ + int klast) /* index of last key to log */ { - xfs_inobt_block_t *block; /* btree block */ - int error; - int lev; /* btree level */ + xfs_inobt_block_t *block; /* btree block to log from */ + int first; /* first byte offset logged */ + xfs_inobt_key_t *kp; /* key pointer in btree block */ + int last; /* last byte offset logged */ - ASSERT(level < cur->bc_nlevels); - /* - * Read-ahead to the left at this level. - */ - xfs_btree_readahead(cur, level, XFS_BTCUR_LEFTRA); - /* - * Decrement the ptr at this level. If we're still in the block - * then we're done. - */ - if (--cur->bc_ptrs[level] > 0) { - *stat = 1; - return 0; - } - /* - * Get a pointer to the btree block. - */ - block = XFS_BUF_TO_INOBT_BLOCK(cur->bc_bufs[level]); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, block, level, - cur->bc_bufs[level]))) - return error; -#endif - /* - * If we just went off the left edge of the tree, return failure. - */ - if (be32_to_cpu(block->bb_leftsib) == NULLAGBLOCK) { - *stat = 0; - return 0; - } - /* - * March up the tree decrementing pointers. - * Stop when we don't go off the left edge of a block. - */ - for (lev = level + 1; lev < cur->bc_nlevels; lev++) { - if (--cur->bc_ptrs[lev] > 0) - break; - /* - * Read-ahead the left block, we're going to read it - * in the next loop. - */ - xfs_btree_readahead(cur, lev, XFS_BTCUR_LEFTRA); - } - /* - * If we went off the root then we are seriously confused. - */ - ASSERT(lev < cur->bc_nlevels); - /* - * Now walk back down the tree, fixing up the cursor's buffer - * pointers and key numbers. - */ - for (block = XFS_BUF_TO_INOBT_BLOCK(cur->bc_bufs[lev]); lev > level; ) { - xfs_agblock_t agbno; /* block number of btree block */ - xfs_buf_t *bp; /* buffer containing btree block */ - - agbno = be32_to_cpu(*XFS_INOBT_PTR_ADDR(block, cur->bc_ptrs[lev], cur)); - if ((error = xfs_btree_read_bufs(cur->bc_mp, cur->bc_tp, - cur->bc_private.i.agno, agbno, 0, &bp, - XFS_INO_BTREE_REF))) - return error; - lev--; - xfs_btree_setbuf(cur, lev, bp); - block = XFS_BUF_TO_INOBT_BLOCK(bp); - if ((error = xfs_btree_check_sblock(cur, block, lev, bp))) - return error; - cur->bc_ptrs[lev] = be16_to_cpu(block->bb_numrecs); - } - *stat = 1; - return 0; + XFS_BTREE_TRACE_CURSOR(cur, ENTRY); + XFS_BTREE_TRACE_ARGBII(cur, bp, kfirst, klast); + block = XFS_BUF_TO_INOBT_BLOCK(bp); + kp = XFS_INOBT_KEY_ADDR(block, 1, cur); + first = (int)((xfs_caddr_t)&kp[kfirst - 1] - (xfs_caddr_t)block); + last = (int)(((xfs_caddr_t)&kp[klast] - 1) - (xfs_caddr_t)block); + xfs_trans_log_buf(cur->bc_tp, bp, first, last); + XFS_BTREE_TRACE_CURSOR(cur, EXIT); } /* - * Delete the record pointed to by cur. - * The cursor refers to the place where the record was (could be inserted) - * when the operation returns. + * Log block pointer fields from a btree block (nonleaf). */ -int /* error */ -xfs_inobt_delete( - xfs_btree_cur_t *cur, /* btree cursor */ - int *stat) /* success/failure */ +STATIC void +xfs_inobt_log_ptrs( + xfs_btree_cur_t *cur, /* btree cursor */ + xfs_buf_t *bp, /* buffer containing btree block */ + int pfirst, /* index of first pointer to log */ + int plast) /* index of last pointer to log */ { - int error; - int i; /* result code */ - int level; /* btree level */ + xfs_inobt_block_t *block; /* btree block to log from */ + int first; /* first byte offset logged */ + int last; /* last byte offset logged */ + xfs_inobt_ptr_t *pp; /* block-pointer pointer in btree blk */ - /* - * Go up the tree, starting at leaf level. - * If 2 is returned then a join was done; go to the next level. - * Otherwise we are done. - */ - for (level = 0, i = 2; i == 2; level++) { - if ((error = xfs_inobt_delrec(cur, level, &i))) - return error; - } - if (i == 0) { - for (level = 1; level < cur->bc_nlevels; level++) { - if (cur->bc_ptrs[level] == 0) { - if ((error = xfs_inobt_decrement(cur, level, &i))) - return error; - break; - } - } - } - *stat = i; - return 0; + XFS_BTREE_TRACE_CURSOR(cur, ENTRY); + XFS_BTREE_TRACE_ARGBII(cur, bp, pfirst, plast); + block = XFS_BUF_TO_INOBT_BLOCK(bp); + pp = XFS_INOBT_PTR_ADDR(block, 1, cur); + first = (int)((xfs_caddr_t)&pp[pfirst - 1] - (xfs_caddr_t)block); + last = (int)(((xfs_caddr_t)&pp[plast] - 1) - (xfs_caddr_t)block); + xfs_trans_log_buf(cur->bc_tp, bp, first, last); + XFS_BTREE_TRACE_CURSOR(cur, EXIT); } - /* - * Get the data from the pointed-to record. + * Log records from a btree block (leaf). */ -int /* error */ -xfs_inobt_get_rec( +STATIC void +xfs_inobt_log_recs( xfs_btree_cur_t *cur, /* btree cursor */ - xfs_agino_t *ino, /* output: starting inode of chunk */ - __int32_t *fcnt, /* output: number of free inodes */ - xfs_inofree_t *free, /* output: free inode mask */ - int *stat) /* output: success/failure */ + xfs_buf_t *bp, /* buffer containing btree block */ + int rfirst, /* index of first record to log */ + int rlast) /* index of last record to log */ { - xfs_inobt_block_t *block; /* btree block */ - xfs_buf_t *bp; /* buffer containing btree block */ -#ifdef DEBUG - int error; /* error return value */ -#endif - int ptr; /* record number */ - xfs_inobt_rec_t *rec; /* record data */ + xfs_inobt_block_t *block; /* btree block to log from */ + int first; /* first byte offset logged */ + int last; /* last byte offset logged */ + xfs_inobt_rec_t *rp; /* record pointer for btree block */ - bp = cur->bc_bufs[0]; - ptr = cur->bc_ptrs[0]; + XFS_BTREE_TRACE_CURSOR(cur, ENTRY); + XFS_BTREE_TRACE_ARGBII(cur, bp, rfirst, rlast); block = XFS_BUF_TO_INOBT_BLOCK(bp); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, block, 0, bp))) - return error; -#endif + rp = XFS_INOBT_REC_ADDR(block, 1, cur); + first = (int)((xfs_caddr_t)&rp[rfirst - 1] - (xfs_caddr_t)block); + last = (int)(((xfs_caddr_t)&rp[rlast] - 1) - (xfs_caddr_t)block); + xfs_trans_log_buf(cur->bc_tp, bp, first, last); + XFS_BTREE_TRACE_CURSOR(cur, EXIT); +} + +static const struct xfs_btree_record_ops xfs_inobt_recops = { + .get_minrecs = xfs_inobt_get_minrecs, + .get_maxrecs = xfs_inobt_get_maxrecs, + .get_numrecs = xfs_btree_get_numrecs, + .set_numrecs = xfs_btree_set_numrecs, + + .init_key_from_rec = xfs_inobt_init_key_from_rec, + .init_ptr_from_cur = xfs_inobt_init_ptr_from_cur, + .init_rec_from_key = xfs_inobt_init_rec_from_key, + .init_rec_from_cur = xfs_inobt_init_rec_from_cur, + + .key_addr = xfs_inobt_key_addr, + .ptr_addr = xfs_inobt_ptr_addr, + .rec_addr = xfs_inobt_rec_addr, + + .key_diff = xfs_inobt_key_diff, + .ptr_to_daddr = xfs_inobt_ptr_to_daddr, + + .move_keys = xfs_inobt_move_keys, + .move_ptrs = xfs_inobt_move_ptrs, + .move_recs = xfs_inobt_move_recs, + + .set_key = xfs_inobt_set_key, + .set_ptr = xfs_inobt_set_ptr, + .set_rec = xfs_inobt_set_rec, + + .log_keys = xfs_inobt_log_keys, + .log_ptrs = xfs_inobt_log_ptrs, + .log_recs = xfs_inobt_log_recs, + + .check_ptrs = xfs_btree_check_sptr, +}; + +STATIC void +xfs_inobt_setroot( + xfs_btree_cur_t *cur, + xfs_btree_ptr_t *nptr, + int inc) /* level change */ +{ + xfs_buf_t *agbp = cur->bc_private.i.agbp; + xfs_agi_t *agi = XFS_BUF_TO_AGI(agbp); + + agi->agi_root = nptr->u.inobt; + be32_add(&agi->agi_level, inc); + xfs_ialloc_log_agi(cur->bc_tp, agbp, XFS_AGI_ROOT | XFS_AGI_LEVEL); +} + + +STATIC int +xfs_inobt_killroot( + xfs_btree_cur_t *cur, + int level, + xfs_btree_ptr_t *newroot) +{ + xfs_buf_t *agbp = cur->bc_private.i.agbp; + xfs_agi_t *agi = XFS_BUF_TO_AGI(agbp); + xfs_agblock_t bno; + int error; + /* - * Off the right end or left end, return failure. + * Set the root entry in the a.g. inode structure, + * decreasing the level by 1. */ - if (ptr > be16_to_cpu(block->bb_numrecs) || ptr <= 0) { - *stat = 0; - return 0; - } + bno = be32_to_cpu(agi->agi_root); + xfs_inobt_setroot(cur, newroot, -1); /* - * Point to the record and extract its data. + * Free the old root. */ - rec = XFS_INOBT_REC_ADDR(block, ptr, cur); - *ino = be32_to_cpu(rec->ir_startino); - *fcnt = be32_to_cpu(rec->ir_freecount); - *free = be64_to_cpu(rec->ir_free); - *stat = 1; + error = xfs_free_extent(cur->bc_tp, + XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.i.agno, bno), 1); + if (error) + return error; + xfs_trans_binval(cur->bc_tp, cur->bc_bufs[level]); + /* + * Update the cursor so there's one fewer level. + */ + cur->bc_bufs[level] = NULL; + cur->bc_nlevels--; return 0; } +static const struct xfs_btree_cur_ops xfs_inobt_curops = { + .set_root = xfs_inobt_setroot, + .new_root = xfs_btree_newroot, + .kill_root = xfs_inobt_killroot, +}; + + +#if defined(XFS_BTREE_TRACE) + /* - * Increment cursor by one record at the level. - * For nonzero levels the leaf-ward information is untouched. + * Global inobt trace buffer */ -int /* error */ -xfs_inobt_increment( - xfs_btree_cur_t *cur, /* btree cursor */ - int level, /* level in btree, 0 is leaf */ - int *stat) /* success/failure */ +ktrace_t *xfs_inobt_trace_buf; +/* + * Add a trace buffer entry for the arguments given to the routine, + * generic form. + */ +STATIC void +xfs_inobt_trace_enter( + const char *func, + xfs_btree_cur_t *cur, + char *s, + int type, + int line, + __psunsigned_t a0, + __psunsigned_t a1, + __psunsigned_t a2, + __psunsigned_t a3, + __psunsigned_t a4, + __psunsigned_t a5, + __psunsigned_t a6, + __psunsigned_t a7, + __psunsigned_t a8, + __psunsigned_t a9, + __psunsigned_t a10) +{ + ktrace_enter(xfs_inobt_trace_buf, + (void *)(__psint_t)type, + (void *)func, (void *)s, (void *)ip, (void *)cur, + (void *)a0, (void *)a1, (void *)a2, (void *)a3, + (void *)a4, (void *)a5, (void *)a6, (void *)a7, + (void *)a8, (void *)a9, (void *)a10); +} + +STATIC void +xfs_inobt_trace_cursor( + xfs_btree_cur_t *cur, + __uint32_t *s0, + __uint64_t *l0, + __uint64_t *l1) +{ + *s0 = cur->bc_private.i.agno; + *l0 = cur->bc_rec.i.ir_startino; + *l1 = cur->bc_rec.i.ir_free; +} + +STATIC void +xfs_inobt_trace_record( + xfs_btree_cur_t *cur, + xfs_btree_rec_t *rec, + __uint64_t *l0, + __uint64_t *l1, + __uint64_t *l2) { - xfs_inobt_block_t *block; /* btree block */ - xfs_buf_t *bp; /* buffer containing btree block */ - int error; /* error return value */ - int lev; /* btree level */ + *l0 = be32_to_cpu(&rec->u.inobt.ir_startino); + *l1 = be32_to_cpu(&rec->u.inobt.ir_freecount); + *l2 = be64_to_cpu(&rec->u.inobt.ir_free); +} - ASSERT(level < cur->bc_nlevels); - /* - * Read-ahead to the right at this level. - */ - xfs_btree_readahead(cur, level, XFS_BTCUR_RIGHTRA); - /* - * Get a pointer to the btree block. - */ - bp = cur->bc_bufs[level]; - block = XFS_BUF_TO_INOBT_BLOCK(bp); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, block, level, bp))) - return error; +static const struct xfs_btree_trc_ops xfs_inobt_trcops = { + .enter = xfs_inobt_trace_enter, + .cursor = xfs_inobt_trace_cursor, + .record = xfs_inobt_trace_record, +}; #endif - /* - * Increment the ptr at this level. If we're still in the block - * then we're done. - */ - if (++cur->bc_ptrs[level] <= be16_to_cpu(block->bb_numrecs)) { - *stat = 1; - return 0; - } - /* - * If we just went off the right edge of the tree, return failure. - */ - if (be32_to_cpu(block->bb_rightsib) == NULLAGBLOCK) { - *stat = 0; - return 0; - } - /* - * March up the tree incrementing pointers. - * Stop when we don't go off the right edge of a block. - */ - for (lev = level + 1; lev < cur->bc_nlevels; lev++) { - bp = cur->bc_bufs[lev]; - block = XFS_BUF_TO_INOBT_BLOCK(bp); -#ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, block, lev, bp))) - return error; + +void +xfs_inobt_init_cursor( + xfs_btree_cur_t *cur) +{ + cur->bc_flags = 0; + cur->bc_curops = &xfs_inobt_curops; + cur->bc_blkops = &xfs_inobt_blkops; + cur->bc_recops = &xfs_inobt_recops; +#if defined(XFS_BTREE_TRACE) + cur->bc_trcops = &xfs_inobt_trcops; #endif - if (++cur->bc_ptrs[lev] <= be16_to_cpu(block->bb_numrecs)) - break; - /* - * Read-ahead the right block, we're going to read it - * in the next loop. - */ - xfs_btree_readahead(cur, lev, XFS_BTCUR_RIGHTRA); - } - /* - * If we went off the root then we are seriously confused. - */ - ASSERT(lev < cur->bc_nlevels); - /* - * Now walk back down the tree, fixing up the cursor's buffer - * pointers and key numbers. - */ - for (bp = cur->bc_bufs[lev], block = XFS_BUF_TO_INOBT_BLOCK(bp); - lev > level; ) { - xfs_agblock_t agbno; /* block number of btree block */ - - agbno = be32_to_cpu(*XFS_INOBT_PTR_ADDR(block, cur->bc_ptrs[lev], cur)); - if ((error = xfs_btree_read_bufs(cur->bc_mp, cur->bc_tp, - cur->bc_private.i.agno, agbno, 0, &bp, - XFS_INO_BTREE_REF))) - return error; - lev--; - xfs_btree_setbuf(cur, lev, bp); - block = XFS_BUF_TO_INOBT_BLOCK(bp); - if ((error = xfs_btree_check_sblock(cur, block, lev, bp))) - return error; - cur->bc_ptrs[lev] = 1; - } - *stat = 1; - return 0; } /* - * Insert the current record at the point referenced by cur. - * The cursor may be inconsistent on return if splits have been done. + * INOBT functions that are not covered by core btree code. + * Externally visible routines. + */ + +/* + * Update the record referred to by cur to the value given + * by [ino, fcnt, free]. + * This either works (return 0) or gets an EFSCORRUPTED error. */ int /* error */ -xfs_inobt_insert( - xfs_btree_cur_t *cur, /* btree cursor */ - int *stat) /* success/failure */ +xfs_inobt_update( + xfs_btree_cur_t *cur, /* btree cursor */ + xfs_agino_t ino, /* starting inode of chunk */ + __int32_t fcnt, /* free inode count */ + xfs_inofree_t free) /* free inode mask */ { - int error; /* error return value */ - int i; /* result value, 0 for failure */ - int level; /* current level number in btree */ - xfs_agblock_t nbno; /* new block number (split result) */ - xfs_btree_cur_t *ncur; /* new cursor (split result) */ - xfs_inobt_rec_t nrec; /* record being inserted this level */ - xfs_btree_cur_t *pcur; /* previous level's cursor */ - - level = 0; - nbno = NULLAGBLOCK; - nrec.ir_startino = cpu_to_be32(cur->bc_rec.i.ir_startino); - nrec.ir_freecount = cpu_to_be32(cur->bc_rec.i.ir_freecount); - nrec.ir_free = cpu_to_be64(cur->bc_rec.i.ir_free); - ncur = NULL; - pcur = cur; - /* - * Loop going up the tree, starting at the leaf level. - * Stop when we don't get a split block, that must mean that - * the insert is finished with this level. - */ - do { - /* - * Insert nrec/nbno into this level of the tree. - * Note if we fail, nbno will be null. - */ - if ((error = xfs_inobt_insrec(pcur, level++, &nbno, &nrec, &ncur, - &i))) { - if (pcur != cur) - xfs_btree_del_cursor(pcur, XFS_BTREE_ERROR); - return error; - } - /* - * See if the cursor we just used is trash. - * Can't trash the caller's cursor, but otherwise we should - * if ncur is a new cursor or we're about to be done. - */ - if (pcur != cur && (ncur || nbno == NULLAGBLOCK)) { - cur->bc_nlevels = pcur->bc_nlevels; - xfs_btree_del_cursor(pcur, XFS_BTREE_NOERROR); - } - /* - * If we got a new cursor, switch to it. - */ - if (ncur) { - pcur = ncur; - ncur = NULL; - } - } while (nbno != NULLAGBLOCK); - *stat = i; - return 0; + xfs_btree_rec_t rec; + + rec.u.inobt.ir_startino = cpu_to_be32(ino); + rec.u.inobt.ir_freecount = cpu_to_be32(fcnt); + rec.u.inobt.ir_free = cpu_to_be64(free); + return xfs_btree_update(cur, &rec); } /* @@ -1986,7 +703,7 @@ xfs_inobt_lookup_eq( cur->bc_rec.i.ir_startino = ino; cur->bc_rec.i.ir_freecount = fcnt; cur->bc_rec.i.ir_free = free; - return xfs_inobt_lookup(cur, XFS_LOOKUP_EQ, stat); + return xfs_btree_lookup(cur, XFS_LOOKUP_EQ, stat); } /* @@ -2004,7 +721,7 @@ xfs_inobt_lookup_ge( cur->bc_rec.i.ir_startino = ino; cur->bc_rec.i.ir_freecount = fcnt; cur->bc_rec.i.ir_free = free; - return xfs_inobt_lookup(cur, XFS_LOOKUP_GE, stat); + return xfs_btree_lookup(cur, XFS_LOOKUP_GE, stat); } /* @@ -2022,57 +739,55 @@ xfs_inobt_lookup_le( cur->bc_rec.i.ir_startino = ino; cur->bc_rec.i.ir_freecount = fcnt; cur->bc_rec.i.ir_free = free; - return xfs_inobt_lookup(cur, XFS_LOOKUP_LE, stat); + return xfs_btree_lookup(cur, XFS_LOOKUP_LE, stat); } /* - * Update the record referred to by cur, to the value given - * by [ino, fcnt, free]. - * This either works (return 0) or gets an EFSCORRUPTED error. + * Get the data from the pointed-to record. */ int /* error */ -xfs_inobt_update( +xfs_inobt_get_rec( xfs_btree_cur_t *cur, /* btree cursor */ - xfs_agino_t ino, /* starting inode of chunk */ - __int32_t fcnt, /* free inode count */ - xfs_inofree_t free) /* free inode mask */ + xfs_agino_t *ino, /* output: starting inode of chunk */ + __int32_t *fcnt, /* output: number of free inodes */ + xfs_inofree_t *free, /* output: free inode mask */ + int *stat) /* output: success/failure */ { - xfs_inobt_block_t *block; /* btree block to update */ + xfs_btree_block_t *block; /* btree block */ xfs_buf_t *bp; /* buffer containing btree block */ +#ifdef DEBUG int error; /* error return value */ - int ptr; /* current record number (updating) */ - xfs_inobt_rec_t *rp; /* pointer to updated record */ +#endif + int ptr; /* record number */ + xfs_btree_rec_t *rec; /* record data */ - /* - * Pick up the current block. - */ - bp = cur->bc_bufs[0]; - block = XFS_BUF_TO_INOBT_BLOCK(bp); + XFS_BTREE_TRACE_CURSOR(cur, ENTRY); + XFS_BTREE_TRACE_ARGFFF(cur, *ino, *fcnt, *free); + + ptr = cur->bc_ptrs[0]; + block = xfs_inobt_get_block(cur, 0, &bp); #ifdef DEBUG - if ((error = xfs_btree_check_sblock(cur, block, 0, bp))) + error = xfs_btree_check_sblock(cur, block, 0, bp); + if (error) return error; #endif /* - * Get the address of the rec to be updated. - */ - ptr = cur->bc_ptrs[0]; - rp = XFS_INOBT_REC_ADDR(block, ptr, cur); - /* - * Fill in the new contents and log them. + * Off the right end or left end, return failure. */ - rp->ir_startino = cpu_to_be32(ino); - rp->ir_freecount = cpu_to_be32(fcnt); - rp->ir_free = cpu_to_be64(free); - xfs_inobt_log_recs(cur, bp, ptr, ptr); + if (ptr > be16_to_cpu(block->bb_h.bb_numrecs) || ptr <= 0) { + XFS_BTREE_TRACE_CURSOR(cur, EXIT); + *stat = 0; + return 0; + } /* - * Updating first record in leaf. Pass new key value up to our parent. + * Point to the record and extract its data. */ - if (ptr == 1) { - xfs_inobt_key_t key; /* key containing [ino] */ - - key.ir_startino = cpu_to_be32(ino); - if ((error = xfs_inobt_updkey(cur, &key, 1))) - return error; - } + rec = xfs_inobt_rec_addr(cur, ptr, block); + *ino = be32_to_cpu(rec->u.inobt.ir_startino); + *fcnt = be32_to_cpu(rec->u.inobt.ir_freecount); + *free = be64_to_cpu(rec->u.inobt.ir_free); + XFS_BTREE_TRACE_CURSOR(cur, EXIT); + *stat = 1; return 0; } + Index: 2.6.x-xfs-new/fs/xfs/xfs_ialloc_btree.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_ialloc_btree.h 2007-10-15 09:58:18.000000000 +1000 +++ 2.6.x-xfs-new/fs/xfs/xfs_ialloc_btree.h 2007-11-06 19:40:29.770666321 +1100 @@ -116,6 +116,8 @@ typedef struct xfs_btree_sblock xfs_inob (XFS_BTREE_PTR_ADDR(xfs_inobt, bb, \ i, XFS_INOBT_BLOCK_MAXRECS(1, cur))) +extern void xfs_inobt_init_cursor(struct xfs_btree_cur *cur); + /* * Decrement cursor by one record at the level. * For nonzero levels the leaf-ward information is untouched. Index: 2.6.x-xfs-new/fs/xfs/xfs_itable.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_itable.c 2007-10-24 16:01:47.000000000 +1000 +++ 2.6.x-xfs-new/fs/xfs/xfs_itable.c 2007-11-06 19:40:29.770666321 +1100 @@ -475,7 +475,7 @@ xfs_bulkstat( * In any case, increment to the next record. */ if (!error) - error = xfs_inobt_increment(cur, 0, &tmp); + error = xfs_btree_increment(cur, 0, &tmp); } else { /* * Start of ag. Lookup the first inode chunk. @@ -541,7 +541,7 @@ xfs_bulkstat( * Set agino to after this chunk and bump the cursor. */ agino = gino + XFS_INODES_PER_CHUNK; - error = xfs_inobt_increment(cur, 0, &tmp); + error = xfs_btree_increment(cur, 0, &tmp); } /* * Drop the btree buffers and the agi buffer. @@ -881,7 +881,7 @@ xfs_inumbers( bufidx = 0; } if (left) { - error = xfs_inobt_increment(cur, 0, &tmp); + error = xfs_btree_increment(cur, 0, &tmp); if (error) { xfs_btree_del_cursor(cur, XFS_BTREE_ERROR); cur = NULL; From owner-xfs@oss.sgi.com Tue Nov 6 01:21:38 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 06 Nov 2007 01:21:44 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from spike.grumly.eu.org (spike.grumly.eu.org [195.5.253.226]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA69LXZZ003355 for ; Tue, 6 Nov 2007 01:21:37 -0800 Received: by spike.grumly.eu.org (Postfix, from userid 1001) id AD82611984; Tue, 6 Nov 2007 10:21:57 +0100 (CET) Date: Tue, 6 Nov 2007 10:21:57 +0100 From: Cedric - Equinoxe Media To: David Chinner Cc: xfs@oss.sgi.com Subject: Re: xfs crash Message-ID: <20071106092157.GB16694@e-m.fr> References: <20071105215135.GA12238@e-m.fr> <20071106082632.GU995458@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20071106082632.GU995458@sgi.com> X-Virus-Scanned: ClamAV 0.91.2/4680/Mon Nov 5 20:49:40 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13563 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: cedric@e-m.fr Precedence: bulk X-list: xfs On 06/11/2007 19:26, David Chinner wrote: > On Mon, Nov 05, 2007 at 10:51:35PM +0100, Cedric - Equinoxe Media wrote: > > XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1563 of file > > fs/xfs/xfs_alloc.c. Caller 0xffffffff88113b35 > > > > Call Trace: > > [] :xfs:xfs_free_ag_extent+0x1a6/0x6b5 > > [] :xfs:xfs_free_extent+0xa9/0xc9 > > [] :xfs:xfs_bmap_finish+0xee/0x167 > > [] :xfs:xfs_itruncate_finish+0x19b/0x2e0 > > [] :xfs:xfs_setattr+0x841/0xe57 > > Corrupted free space btree, by the look of it. Can you run > xfs_check on the filesystem and report the output. You can recover > from this by running xfs_repair. xfs_check /dev/sda4 : bad format 2 for inode 2961770479 type 0 bad format 2 for inode 3229517262 type 0 block 20/621714 type unknown not expected link count mismatch for inode 2961770479 (name ?), nlink 0, counted 1 link count mismatch for inode 3229517262 (name ?), nlink 0, counted 1 the xfs_repair worked perfectly. Do you have an idea why this corruption happened ? Thanks. Cédric From owner-xfs@oss.sgi.com Tue Nov 6 06:41:33 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 06 Nov 2007 06:41:35 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=BAYES_05,SUBJECT_FUZZY_TION autolearn=no version=3.3.0-r574664 Received: from r2d2.neofacto.lu (mail.neofacto.lu [158.64.60.195]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA6EfULA018283 for ; Tue, 6 Nov 2007 06:41:32 -0800 Received: from localhost (localhost [127.0.0.1]) by r2d2.neofacto.lu (Postfix) with ESMTP id 323CCC2F3E for ; Tue, 6 Nov 2007 15:09:14 +0100 (CET) X-Virus-Scanned: ClamAV 0.91.2/4680/Mon Nov 5 20:49:40 2007 on oss.sgi.com X-Virus-Scanned: Ubuntu amavisd-new at neofacto.lu Received: from r2d2.neofacto.lu ([127.0.0.1]) by localhost (r2d2.neofacto.lu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cYvlIgzHPDPY for ; Tue, 6 Nov 2007 15:09:00 +0100 (CET) Received: by r2d2.neofacto.lu (Postfix, from userid 65534) id C956DC302E; Tue, 6 Nov 2007 14:58:24 +0100 (CET) Received: from [192.168.1.166] (SU105.tudor.lu [158.64.4.205]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by r2d2.neofacto.lu (Postfix) with ESMTP id 80341C2F49 for ; Tue, 6 Nov 2007 14:58:15 +0100 (CET) Message-ID: <473072FD.4070104@jamendo.com> Date: Tue, 06 Nov 2007 14:58:21 +0100 From: Amandine AUPETIT User-Agent: Thunderbird 2.0.0.6 (X11/20071022) MIME-Version: 1.0 To: xfs@oss.sgi.com Subject: 7Tb XFS partition lost on reboot Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Status: Clean X-archive-position: 13564 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: amandine@jamendo.com Precedence: bulk X-list: xfs Hi ! (I'm on Ubuntu 7.10 64 bits) I have array of 12 750gb disks in hardware Raid 6 that gives me a 7Tb partition. I tried to format it in ext3, but it took too much time, so I tried in Reiserfs, but the partition were lost on reboot, and now i'm trying XFS. So I created the partition with parted, because fdisk can't do more that 2tb partitions. It's ok, I can do what I want but... on reboot, there is a Superblock problem, something like that. When I check with xfs_check : xfs_check: unexpected XFS SB magic number 0x00000000 xfs_check: read failed: Invalid argument xfs_check: data size check failed cache_node_purge: refcount was 1, not zero (node=0x681420) xfs_check: cannot read root inode (22) bad superblock magic number 0, giving up So, I tried to delete the partition with parted, to recreate a new one. No problem. But when I mount the new partition, all the data that were on my deleted partition are there !!! That's of course not a problem, but I'm wondering if there's a way to have this partition work directly without having to delete and recreate it ? I checked the /proc/partition before and after doing parted : BEFORE major minor #blocks name 104 0 35532720 cciss/c0d0 104 1 34025638 cciss/c0d0p1 104 2 1 cciss/c0d0p2 104 5 1502046 cciss/c0d0p5 105 0 7325417080 cciss/c1d0 105 1 [B]882966102[/B] cciss/c1d0p1 AFTER major minor #blocks name 104 0 35532720 cciss/c0d0 104 1 34025638 cciss/c0d0p1 104 2 1 cciss/c0d0p2 104 5 1502046 cciss/c0d0p5 105 0 7325417080 cciss/c1d0 105 1 [B]7325417046[/B] cciss/c1d0p1 So, any idea what I could do ? Thanks a lot Amandine From owner-xfs@oss.sgi.com Tue Nov 6 08:07:03 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 06 Nov 2007 08:07:07 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from spike.grumly.eu.org (spike.grumly.eu.org [195.5.253.226]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA6G6xxb030197 for ; Tue, 6 Nov 2007 08:07:02 -0800 Received: by spike.grumly.eu.org (Postfix, from userid 1001) id 5AC22119FA; Tue, 6 Nov 2007 17:07:21 +0100 (CET) Date: Tue, 6 Nov 2007 17:07:21 +0100 From: Cedric - Equinoxe Media To: David Chinner Cc: xfs@oss.sgi.com Subject: Re: xfs crash Message-ID: <20071106160721.GB25295@e-m.fr> References: <20071105215135.GA12238@e-m.fr> <20071106082632.GU995458@sgi.com> <20071106092157.GB16694@e-m.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20071106092157.GB16694@e-m.fr> X-Virus-Scanned: ClamAV 0.91.2/4681/Tue Nov 6 04:52:41 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13565 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: cedric@e-m.fr Precedence: bulk X-list: xfs I just had exactly the same crash again today : /dev/sda4 on /filer type xfs (rw,noexec,nosuid,nodev,noatime) Nov 6 16:40:24 fng2 kernel: XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1563 of file fs/xfs/xfs_alloc.c. Caller 0xffffffff88113b35 Nov 6 16:40:24 fng2 kernel: Nov 6 16:40:24 fng2 kernel: Call Trace: Nov 6 16:40:24 fng2 kernel: [] :xfs:xfs_free_ag_extent+0x1a6/0x6b5 Nov 6 16:40:24 fng2 kernel: [] :xfs:xfs_free_extent+0xa9/0xc9 Nov 6 16:40:24 fng2 kernel: [] :xfs:xfs_bmap_finish+0xee/0x167 Nov 6 16:40:24 fng2 kernel: [] :xfs:xfs_itruncate_finish+0x19b/0x2e0 Nov 6 16:40:24 fng2 kernel: [] :xfs:xfs_setattr+0x841/0xe57 Nov 6 16:40:24 fng2 kernel: [] :xfs:xfs_fs_get_dentry+0x38/0x59 Nov 6 16:40:24 fng2 kernel: [] task_rq_lock+0x3d/0x6f Nov 6 16:40:24 fng2 kernel: [] __activate_task+0x26/0x38 Nov 6 16:40:24 fng2 kernel: [] :xfs:xfs_vn_setattr+0x121/0x144 Nov 6 16:40:24 fng2 kernel: [] notify_change+0x156/0x2f1 Nov 6 16:40:24 fng2 kernel: [] :nfsd:nfsd_setattr+0x334/0x4b1 Nov 6 16:40:24 fng2 kernel: [] :nfsd:nfsd3_proc_setattr+0xa2/0xae Nov 6 16:40:24 fng2 kernel: [] :nfsd:nfsd_dispatch+0xdd/0x19e Nov 6 16:40:24 fng2 kernel: [] :sunrpc:svc_process+0x3df/0x6ef Nov 6 16:40:24 fng2 kernel: [] __down_read+0x12/0x9a Nov 6 16:40:24 fng2 kernel: [] :nfsd:nfsd+0x191/0x2ac Nov 6 16:40:24 fng2 kernel: [] child_rip+0xa/0x12 Nov 6 16:40:24 fng2 kernel: [] :nfsd:nfsd+0x0/0x2ac Nov 6 16:40:24 fng2 kernel: [] child_rip+0x0/0x12 Nov 6 16:40:24 fng2 kernel: Nov 6 16:40:24 fng2 kernel: xfs_force_shutdown(sda4,0x8) called from line 4258 of file fs/xfs/xfs_bmap.c. Return address = 0xffffffff8811cfb4 Nov 6 16:40:24 fng2 kernel: Filesystem "sda4": Corruption of in-memory data detected. Shutting down filesystem: sda4 Nov 6 16:40:24 fng2 kernel: Please umount the filesystem, and rectify the problem(s) Seems to be again on a setattr() ? Regards. Cédric From owner-xfs@oss.sgi.com Tue Nov 6 08:44:15 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 06 Nov 2007 08:44:19 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from lucidpixels.com (lucidpixels.com [75.144.35.66]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA6GiEm2007594 for ; Tue, 6 Nov 2007 08:44:15 -0800 Received: by lucidpixels.com (Postfix, from userid 1001) id 58AC71C000263; Tue, 6 Nov 2007 11:44:19 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by lucidpixels.com (Postfix) with ESMTP id 549744019521; Tue, 6 Nov 2007 11:44:19 -0500 (EST) Date: Tue, 6 Nov 2007 11:44:19 -0500 (EST) From: Justin Piszcz X-X-Sender: jpiszcz@p34.internal.lan To: Cedric - Equinoxe Media cc: David Chinner , xfs@oss.sgi.com Subject: Re: xfs crash In-Reply-To: <20071106160721.GB25295@e-m.fr> Message-ID: References: <20071105215135.GA12238@e-m.fr> <20071106082632.GU995458@sgi.com> <20071106092157.GB16694@e-m.fr> <20071106160721.GB25295@e-m.fr> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="-1463747160-1479258781-1194367459=:17411" X-Virus-Scanned: ClamAV 0.91.2/4681/Tue Nov 6 04:52:41 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13566 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jpiszcz@lucidpixels.com Precedence: bulk X-list: xfs This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---1463747160-1479258781-1194367459=:17411 Content-Type: TEXT/PLAIN; charset=iso-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Tue, 6 Nov 2007, Cedric - Equinoxe Media wrote: > I just had exactly the same crash again today : > /dev/sda4 on /filer type xfs (rw,noexec,nosuid,nodev,noatime) > > Nov 6 16:40:24 fng2 kernel: XFS internal error XFS_WANT_CORRUPTED_GOTO a= t line 1563 of file fs/xfs/xfs_alloc.c. Caller 0xffffffff88113b35 > Nov 6 16:40:24 fng2 kernel: > Nov 6 16:40:24 fng2 kernel: Call Trace: > Nov 6 16:40:24 fng2 kernel: [] :xfs:xfs_free_ag_exten= t+0x1a6/0x6b5 > Nov 6 16:40:24 fng2 kernel: [] :xfs:xfs_free_extent+0= xa9/0xc9 > Nov 6 16:40:24 fng2 kernel: [] :xfs:xfs_bmap_finish+0= xee/0x167 > Nov 6 16:40:24 fng2 kernel: [] :xfs:xfs_itruncate_fin= ish+0x19b/0x2e0 > Nov 6 16:40:24 fng2 kernel: [] :xfs:xfs_setattr+0x841= /0xe57 > Nov 6 16:40:24 fng2 kernel: [] :xfs:xfs_fs_get_dentry= +0x38/0x59 > Nov 6 16:40:24 fng2 kernel: [] task_rq_lock+0x3d/0x6f > Nov 6 16:40:24 fng2 kernel: [] __activate_task+0x26/0= x38 > Nov 6 16:40:24 fng2 kernel: [] :xfs:xfs_vn_setattr+0x= 121/0x144 > Nov 6 16:40:24 fng2 kernel: [] notify_change+0x156/0x= 2f1 > Nov 6 16:40:24 fng2 kernel: [] :nfsd:nfsd_setattr+0x3= 34/0x4b1 > Nov 6 16:40:24 fng2 kernel: [] :nfsd:nfsd3_proc_setat= tr+0xa2/0xae > Nov 6 16:40:24 fng2 kernel: [] :nfsd:nfsd_dispatch+0x= dd/0x19e > Nov 6 16:40:24 fng2 kernel: [] :sunrpc:svc_process+0x= 3df/0x6ef > Nov 6 16:40:24 fng2 kernel: [] __down_read+0x12/0x9a > Nov 6 16:40:24 fng2 kernel: [] :nfsd:nfsd+0x191/0x2ac > Nov 6 16:40:24 fng2 kernel: [] child_rip+0xa/0x12 > Nov 6 16:40:24 fng2 kernel: [] :nfsd:nfsd+0x0/0x2ac > Nov 6 16:40:24 fng2 kernel: [] child_rip+0x0/0x12 > Nov 6 16:40:24 fng2 kernel: > Nov 6 16:40:24 fng2 kernel: xfs_force_shutdown(sda4,0x8) called from lin= e 4258 of file fs/xfs/xfs_bmap.c. Return address =3D 0xffffffff8811cfb4 > Nov 6 16:40:24 fng2 kernel: Filesystem "sda4": Corruption of in-memory d= ata detected. Shutting down filesystem: sda4 > Nov 6 16:40:24 fng2 kernel: Please umount the filesystem, and rectify th= e problem(s) > > Seems to be again on a setattr() ? > > Regards. > C=E9dric > > Have you run a memory test on your server? memtest86 ---1463747160-1479258781-1194367459=:17411-- From owner-xfs@oss.sgi.com Tue Nov 6 08:46:13 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 06 Nov 2007 08:46:16 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00, SUBJECT_FUZZY_TION autolearn=no version=3.3.0-r574664 Received: from hogwarts.egr.duke.edu (hogwarts.egr.duke.edu [152.3.195.84]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA6GkAKd008127 for ; Tue, 6 Nov 2007 08:46:13 -0800 Received: from hogwarts.egr.duke.edu (localhost.localdomain [127.0.0.1]) by hogwarts.egr.duke.edu (8.13.1/8.13.1) with ESMTP id lA6GkBNZ017533; Tue, 6 Nov 2007 11:46:11 -0500 Received: from localhost (jlb@localhost) by hogwarts.egr.duke.edu (8.13.1/8.13.1/Submit) with ESMTP id lA6Gk9uA017530; Tue, 6 Nov 2007 11:46:11 -0500 X-Authentication-Warning: hogwarts.egr.duke.edu: jlb owned process doing -bs Date: Tue, 6 Nov 2007 11:46:09 -0500 (EST) From: Joshua Baker-LePain X-X-Sender: jlb@hogwarts.egr.duke.edu To: Amandine AUPETIT cc: xfs@oss.sgi.com Subject: Re: 7Tb XFS partition lost on reboot In-Reply-To: <473072FD.4070104@jamendo.com> Message-ID: References: <473072FD.4070104@jamendo.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: ClamAV 0.91.2/4681/Tue Nov 6 04:52:41 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13567 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jlb17@duke.edu Precedence: bulk X-list: xfs On Tue, 6 Nov 2007 at 2:58pm, Amandine AUPETIT wrote > So I created the partition with parted, because fdisk can't do more that 2tb > partitions. > It's ok, I can do what I want but... > > on reboot, there is a Superblock problem, something like that. When I check > with xfs_check : First guess -- did you use a gpt disklabel on that device? Standard (msdos) disklabels don't work on devices >2TB. The usual symptom of a big device with an msdos disklabel is that the partition table goes away on reboot. -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF From owner-xfs@oss.sgi.com Tue Nov 6 09:08:16 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 06 Nov 2007 09:08:23 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from spike.grumly.eu.org (spike.grumly.eu.org [195.5.253.226]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA6H8CYu011051 for ; Tue, 6 Nov 2007 09:08:16 -0800 Received: by spike.grumly.eu.org (Postfix, from userid 1001) id 072A811960; Tue, 6 Nov 2007 18:08:37 +0100 (CET) Date: Tue, 6 Nov 2007 18:08:37 +0100 From: Cedric - Equinoxe Media To: xfs@oss.sgi.com Subject: Re: xfs crash Message-ID: <20071106170837.GC25295@e-m.fr> References: <20071105215135.GA12238@e-m.fr> <20071106082632.GU995458@sgi.com> <20071106092157.GB16694@e-m.fr> <20071106160721.GB25295@e-m.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Virus-Scanned: ClamAV 0.91.2/4682/Tue Nov 6 07:42:37 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13568 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: cedric@e-m.fr Precedence: bulk X-list: xfs On 06/11/2007 11:44, Justin Piszcz wrote: > Have you run a memory test on your server? memtest86 But I doubt it is hardware memory corruption because it is a brand new dell server with ECC memory and the backtrace is always the same. Anyway I will do the memtest tomorrow... I also have a spare server, If I find nothing I will move everything to the spare and wait for the possible bug to appear again. Regards Cédric From owner-xfs@oss.sgi.com Tue Nov 6 09:27:04 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 06 Nov 2007 09:27:06 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=0.9 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE, J_CHICKENPOX_42,J_CHICKENPOX_43 autolearn=no version=3.3.0-r574664 Received: from wr-out-0506.google.com (wr-out-0506.google.com [64.233.184.225]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA6HR0ux013831 for ; Tue, 6 Nov 2007 09:27:03 -0800 Received: by wr-out-0506.google.com with SMTP id c48so148366wra for ; Tue, 06 Nov 2007 09:27:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; bh=bGZp5kFip+2MK0zxw/wQj1HCX6f9DWekhByHVUd5t/Q=; b=dXFsJn5GKaieMrqaHaA82fXGKfZeeOOK/nTTG9JTzY1ALbKzswlnWYNR0fCGZ42znRrIgHeGYmnWYqqN4dp9FfZVJaxz4DejapzGUM4JNR+KBYoRiJx39/WcftRBufb1L8TD7DHF2Nlq7PnVPfJ/lM0o9xECFxR2bHBUpjiL6Wk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; b=Y6LFvWyhB8cO/5Xy9+vd2ZMy0ojbdyZB5CywX2ZPZAv/0FvFRYcWyKr7H5qc4hwv/VtGifrDGZ/DdLPd02ARe31qwTV79RxNdb4gxc9ourRmd+2sUH+9gNSWL4yOum9gPkm8gwFdL16bHXxRURwm06PnGCDatLqY0T+PzF0owME= Received: by 10.142.191.2 with SMTP id o2mr1670563wff.1194370023388; Tue, 06 Nov 2007 09:27:03 -0800 (PST) Received: by 10.142.162.19 with HTTP; Tue, 6 Nov 2007 09:27:03 -0800 (PST) Message-ID: Date: Tue, 6 Nov 2007 22:57:03 +0530 From: "Bhagi rathi" To: "David Chinner" Subject: Re: TAKE 972756 - Implement fallocate. Cc: xfs@oss.sgi.com In-Reply-To: <20071106001223.GY66820511@sgi.com> MIME-Version: 1.0 References: <20071102024314.9BF3458C38F7@chook.melbourne.sgi.com> <20071106001223.GY66820511@sgi.com> X-Virus-Scanned: ClamAV 0.91.2/4682/Tue Nov 6 07:42:37 2007 on oss.sgi.com X-Virus-Status: Clean Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: 7bit Content-length: 1184 X-archive-position: 13569 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jahnu77@gmail.com Precedence: bulk X-list: xfs File is of size 1k. A 4k block is allocated as file-system block size is 4k. Preallocation happened from 1k to 256k. Now, it looks to me that we have un-written extents from 4k to 256k. There is no guarantee that data from 1k to 4k is all zero'es. Fallocate is updating size. Hence on subsequent read, we can get garbage from 1k to 4k and all zero'es from 4k to 256k Is the expectation here is application should take the responsibility of zero'ing data? I still need to through fallocate requirements. -Thanks, Bhagi. On 11/6/07, David Chinner wrote: > > On Tue, Nov 06, 2007 at 12:12:52AM +0530, Bhagi rathi wrote: > > David, What happens if offset is not aligned to 4k? Let's say we have a > file > > whose size is > > not aligned to 4k. It could have blocks beyond the eof which haven't > been > > zero'ed out. > > No it won't. They are *preallocated* blocks, which by definition are > zero-filled. Preallocated blocks are marked as unwritten on disk, so > it is known that they contain zeros, even if they lie beyond EOF. > > Cheers, > > Dave. > -- > Dave Chinner > Principal Engineer > SGI Australian Software Group > [[HTML alternate version deleted]] From owner-xfs@oss.sgi.com Tue Nov 6 10:01:27 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 06 Nov 2007 10:01:33 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=BAYES_50 autolearn=ham version=3.3.0-r574664 Received: from astoria.ccjclearline.com (astoria.ccjclearline.com [64.235.106.9]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA6I1Mh0018327 for ; Tue, 6 Nov 2007 10:01:26 -0800 Received: from [99.236.101.138] (helo=crashcourse.ca) by astoria.ccjclearline.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.68) (envelope-from ) id 1IpQNv-0007fs-6Z for xfs@oss.sgi.com; Tue, 06 Nov 2007 10:30:43 -0500 Date: Tue, 6 Nov 2007 10:28:44 -0500 (EST) From: "Robert P. J. Day" X-X-Sender: rpjday@localhost.localdomain To: xfs@oss.sgi.com Subject: use is_power_of_2() macro? Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - astoria.ccjclearline.com X-AntiAbuse: Original Domain - oss.sgi.com X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - crashcourse.ca X-Source: X-Source-Args: X-Source-Dir: X-Virus-Scanned: ClamAV 0.91.2/4682/Tue Nov 6 07:42:37 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13570 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: rpjday@crashcourse.ca Precedence: bulk X-list: xfs given this in fs/xfs/xfs_inode.c: /* * xfs_iroundup: round up argument to next power of two */ uint xfs_iroundup( uint v) { int i; uint m; if ((v & (v - 1)) == 0) return v; ASSERT((v & 0x80000000) == 0); if ((v & (v + 1)) == 0) return v + 1; for (i = 0, m = 1; i < 31; i++, m <<= 1) { if (v & m) continue; v |= m; if ((v & (v + 1)) == 0) return v + 1; } ASSERT(0); return( 0 ); } is there any reason that can't be rewritten with simply roundup_pow_of_two() as defined in include/linux/log2.h? #define roundup_pow_of_two(n) \ ( \ __builtin_constant_p(n) ? ( \ (n == 1) ? 1 : \ (1UL << (ilog2((n) - 1) + 1)) \ ) : \ __roundup_pow_of_two(n) \ ) just curious. rday -- ======================================================================== Robert P. J. Day Linux Consulting, Training and Annoying Kernel Pedantry Waterloo, Ontario, CANADA http://crashcourse.ca ======================================================================== From owner-xfs@oss.sgi.com Tue Nov 6 10:59:29 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 06 Nov 2007 10:59:36 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-4.6 required=5.0 tests=AWL,BAYES_00, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SUBJECT_FUZZY_TION autolearn=ham version=3.3.0-r574664 Received: from mx1.redhat.com (mx1.redhat.com [66.187.233.31]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA6IxPYn028921 for ; Tue, 6 Nov 2007 10:59:29 -0800 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.13.8/8.13.1) with ESMTP id lA6IxRXo000423; Tue, 6 Nov 2007 13:59:27 -0500 Received: from lacrosse.corp.redhat.com (lacrosse.corp.redhat.com [172.16.52.154]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id lA6IxR93023320; Tue, 6 Nov 2007 13:59:27 -0500 Received: from [10.15.80.10] (neon.msp.redhat.com [10.15.80.10]) by lacrosse.corp.redhat.com (8.12.11.20060308/8.11.6) with ESMTP id lA6IxO6U031373; Tue, 6 Nov 2007 13:59:26 -0500 Message-ID: <4730B98C.5090008@sandeen.net> Date: Tue, 06 Nov 2007 12:59:24 -0600 From: Eric Sandeen User-Agent: Thunderbird 1.5.0.12 (X11/20070530) MIME-Version: 1.0 To: Joshua Baker-LePain CC: Amandine AUPETIT , xfs@oss.sgi.com Subject: Re: 7Tb XFS partition lost on reboot References: <473072FD.4070104@jamendo.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4683/Tue Nov 6 10:30:56 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13571 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: sandeen@sandeen.net Precedence: bulk X-list: xfs Joshua Baker-LePain wrote: > On Tue, 6 Nov 2007 at 2:58pm, Amandine AUPETIT wrote > >> So I created the partition with parted, because fdisk can't do more that 2tb >> partitions. >> It's ok, I can do what I want but... >> >> on reboot, there is a Superblock problem, something like that. When I check >> with xfs_check : > > First guess -- did you use a gpt disklabel on that device? Standard > (msdos) disklabels don't work on devices >2TB. The usual symptom of a big > device with an msdos disklabel is that the partition table goes away on > reboot. I second that hunch. :) -Eric From owner-xfs@oss.sgi.com Tue Nov 6 11:04:04 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 06 Nov 2007 11:04:07 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-4.6 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_42, J_CHICKENPOX_43,RCVD_IN_DNSWL_MED,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from mx1.redhat.com (mx1.redhat.com [66.187.233.31]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA6J43b7029852 for ; Tue, 6 Nov 2007 11:04:04 -0800 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.13.8/8.13.1) with ESMTP id lA6J47a8001845; Tue, 6 Nov 2007 14:04:07 -0500 Received: from lacrosse.corp.redhat.com (lacrosse.corp.redhat.com [172.16.52.154]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id lA6J47xK026664; Tue, 6 Nov 2007 14:04:07 -0500 Received: from [10.15.80.10] (neon.msp.redhat.com [10.15.80.10]) by lacrosse.corp.redhat.com (8.12.11.20060308/8.11.6) with ESMTP id lA6J46uE000447; Tue, 6 Nov 2007 14:04:06 -0500 Message-ID: <4730BAA5.1080406@sandeen.net> Date: Tue, 06 Nov 2007 13:04:05 -0600 From: Eric Sandeen User-Agent: Thunderbird 1.5.0.12 (X11/20070530) MIME-Version: 1.0 To: Bhagi rathi CC: David Chinner , xfs@oss.sgi.com Subject: Re: TAKE 972756 - Implement fallocate. References: <20071102024314.9BF3458C38F7@chook.melbourne.sgi.com> <20071106001223.GY66820511@sgi.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4683/Tue Nov 6 10:30:56 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13572 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: sandeen@sandeen.net Precedence: bulk X-list: xfs Bhagi rathi wrote: > File is of size 1k. A 4k block is allocated as file-system block size is > 4k. > Preallocation happened from 1k to 256k. Now, it looks to me that we have > un-written extents from 4k to 256k. There is no guarantee that data from 1k > to 4k is all zero'es. Fallocate is updating size. Hence on subsequent read, > we can get garbage from 1k to 4k and all zero'es from 4k to 256k You've tested this and found it to be true? -Eric > Is the expectation here is application should take the responsibility of > zero'ing > data? I still need to through fallocate requirements. From owner-xfs@oss.sgi.com Tue Nov 6 11:18:28 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 06 Nov 2007 11:18:33 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA6JIQBR032278 for ; Tue, 6 Nov 2007 11:18:27 -0800 Received: from f237116.upc-f.chello.nl ([80.56.237.116] helo=[192.168.0.111]) by pentafluge.infradead.org with esmtpsa (Exim 4.63 #1 (Red Hat Linux)) id 1IpTfv-0007IT-Cx; Tue, 06 Nov 2007 19:01:23 +0000 Subject: Re: writeout stalls in current -git From: Peter Zijlstra To: David Chinner Cc: Torsten Kaiser , Fengguang Wu , Maxim Levitsky , linux-kernel@vger.kernel.org, Andrew Morton , linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com In-Reply-To: <20071106042527.GT995458@sgi.com> References: <393903856.06449@ustc.edu.cn> <64bb37e0711011120i63cdfe3ci18995d57b6649a8@mail.gmail.com> <64bb37e0711011200n228e708eg255640388f83da22@mail.gmail.com> <1193998532.27652.343.camel@twins> <64bb37e0711021222q7d12c825mc62d433c4fe19e8@mail.gmail.com> <20071102204258.GR995458@sgi.com> <64bb37e0711040319l5de285c3xea64474540a51b6e@mail.gmail.com> <20071105014510.GU66820511@sgi.com> <64bb37e0711051027v49869699s9593ea54713b15ff@mail.gmail.com> <20071106042527.GT995458@sgi.com> Content-Type: text/plain Date: Tue, 06 Nov 2007 20:01:22 +0100 Message-Id: <1194375682.6289.88.camel@twins> Mime-Version: 1.0 X-Mailer: Evolution 2.10.1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4683/Tue Nov 6 10:30:56 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13573 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: peterz@infradead.org Precedence: bulk X-list: xfs On Tue, 2007-11-06 at 15:25 +1100, David Chinner wrote: > I'm struggling to understand what possible changed in XFS or writeback that > would lead to stalls like this, esp. as you appear to be removing files when > the stalls occur. Just a crazy idea,.. Could there be a set_page_dirty() that doesn't have balance_dirty_pages() call near? For example modifying meta data in unlink? Such a situation could lead to an excess of dirty pages and the next call to balance_dirty_pages() would appear to stall, as it would desperately try to get below the limit again. From owner-xfs@oss.sgi.com Tue Nov 6 12:26:11 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 06 Nov 2007 12:26:18 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from py-out-1112.google.com (py-out-1112.google.com [64.233.166.180]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA6KQ1jx011466 for ; Tue, 6 Nov 2007 12:26:10 -0800 Received: by py-out-1112.google.com with SMTP id u77so4148314pyb for ; Tue, 06 Nov 2007 12:26:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=aSteImI9UbqzHyP5g7FCeQ+QCyP8T0EWeHKKbofjW0o=; b=S7gDtLwRUiE5Ovdf8ur4fxwRnoaOXK2sVwA7E6OUz4ya4PK2by4zM+QHiMD9mHqPkz1Efzq8LL0dVaC3m9nRfYT0tYScklfxXCCRNdvnlmmu1+JbkLoPkm55cQP2zRLLaTmunT9GLIZWWaHJOyEp3qti/rRHNfGlhE7Ilm2147E= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=QRbL0zT95BKKQ+R/H0hZFxCjLgIYnr0gnPQaQ3b0R5Py/dP7MltHWw118p4ZP2lls6Em6I34ZooZDS6VGmuPG+n8+r+Y1lnl53ncf0FZorpLCHGSJBxP+KoDzH3VqV7FjhB9n2nBppye+f2yZgLDHHLdT7xiMENqLPpeLDHRydM= Received: by 10.65.100.14 with SMTP id c14mr13116180qbm.1194380765201; Tue, 06 Nov 2007 12:26:05 -0800 (PST) Received: by 10.65.112.13 with HTTP; Tue, 6 Nov 2007 12:26:05 -0800 (PST) Message-ID: <64bb37e0711061226l48dce395ub2f9539efc66ecc0@mail.gmail.com> Date: Tue, 6 Nov 2007 21:26:05 +0100 From: "Torsten Kaiser" To: "Peter Zijlstra" Subject: Re: writeout stalls in current -git Cc: "David Chinner" , "Fengguang Wu" , "Maxim Levitsky" , linux-kernel@vger.kernel.org, "Andrew Morton" , linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com In-Reply-To: <1194375682.6289.88.camel@twins> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <393903856.06449@ustc.edu.cn> <1193998532.27652.343.camel@twins> <64bb37e0711021222q7d12c825mc62d433c4fe19e8@mail.gmail.com> <20071102204258.GR995458@sgi.com> <64bb37e0711040319l5de285c3xea64474540a51b6e@mail.gmail.com> <20071105014510.GU66820511@sgi.com> <64bb37e0711051027v49869699s9593ea54713b15ff@mail.gmail.com> <20071106042527.GT995458@sgi.com> <1194375682.6289.88.camel@twins> X-Virus-Scanned: ClamAV 0.91.2/4683/Tue Nov 6 10:30:56 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13574 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: just.for.lkml@googlemail.com Precedence: bulk X-list: xfs On 11/6/07, Peter Zijlstra wrote: > On Tue, 2007-11-06 at 15:25 +1100, David Chinner wrote: > > > I'm struggling to understand what possible changed in XFS or writeback that > > would lead to stalls like this, esp. as you appear to be removing files when > > the stalls occur. > > Just a crazy idea,.. > > Could there be a set_page_dirty() that doesn't have > balance_dirty_pages() call near? For example modifying meta data in > unlink? > > Such a situation could lead to an excess of dirty pages and the next > call to balance_dirty_pages() would appear to stall, as it would > desperately try to get below the limit again. Only if accounting of the dirty pages is also broken. In the unmerge testcase I see most of the time only <200kb of dirty data in /proc/meminfo. The system has 4Gb of RAM so I'm not sure if it should ever be valid to stall even the emerge/install testcase. Torsten Now building a kernel with the skipped-pages-accounting-patch reverted... From owner-xfs@oss.sgi.com Tue Nov 6 12:41:11 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 06 Nov 2007 12:41:14 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_42 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA6Kf6iY013333 for ; Tue, 6 Nov 2007 12:41:10 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id HAA07528; Wed, 7 Nov 2007 07:41:04 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lA6Kf2dD96273635; Wed, 7 Nov 2007 07:41:03 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lA6Kf1Tj94970484; Wed, 7 Nov 2007 07:41:01 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Wed, 7 Nov 2007 07:41:00 +1100 From: David Chinner To: Bhagi rathi Cc: David Chinner , xfs@oss.sgi.com Subject: Re: TAKE 972756 - Implement fallocate. Message-ID: <20071106204100.GW995458@sgi.com> References: <20071102024314.9BF3458C38F7@chook.melbourne.sgi.com> <20071106001223.GY66820511@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4683/Tue Nov 6 10:30:56 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13575 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Tue, Nov 06, 2007 at 10:57:03PM +0530, Bhagi rathi wrote: > File is of size 1k. A 4k block is allocated as file-system block size is > 4k. > Preallocation happened from 1k to 256k. Now, it looks to me that we have > un-written extents from 4k to 256k. There is no guarantee that data from 1k > to 4k is all zero'es. Fallocate is updating size. Hence on subsequent read, > we can get garbage from 1k to 4k and all zero'es from 4k to 256k # rm /mnt/test/fred # xfs_io -f -c "pwrite 0 1024" -c "fsync" -c "falloc_allocsp 0 262144" -c "bmap -vp" /mnt/test/fred wrote 1024/1024 bytes at offset 0 1 KiB, 1 ops; 0.0000 sec (42.459 MiB/sec and 43478.2609 ops/sec) /mnt/test/fred: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS 0: [0..7]: 14520..14527 0 (14520..14527) 8 00000 1: [8..511]: 345688..346191 0 (345688..346191) 504 10000 # dd if=/mnt/test/fred bs=4k count=1 |od -Ax 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.004566 seconds, 897 kB/s 000000 146715 146715 146715 146715 146715 146715 146715 146715 * 000400 000000 000000 000000 000000 000000 000000 000000 000000 * 001000 Only 1k of modified data, then 3k of zeros, then a bunch of unwritten extents out to EOF. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Tue Nov 6 12:56:05 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 06 Nov 2007 12:56:09 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA6Ku13c015102 for ; Tue, 6 Nov 2007 12:56:04 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id HAA07893; Wed, 7 Nov 2007 07:55:59 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lA6KtwdD96464484; Wed, 7 Nov 2007 07:55:58 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lA6KtuKm95774644; Wed, 7 Nov 2007 07:55:56 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Wed, 7 Nov 2007 07:55:56 +1100 From: David Chinner To: Cedric - Equinoxe Media Cc: David Chinner , xfs@oss.sgi.com Subject: Re: xfs crash Message-ID: <20071106205556.GZ995458@sgi.com> References: <20071105215135.GA12238@e-m.fr> <20071106082632.GU995458@sgi.com> <20071106092157.GB16694@e-m.fr> <20071106160721.GB25295@e-m.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071106160721.GB25295@e-m.fr> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4684/Tue Nov 6 11:09:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13576 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Tue, Nov 06, 2007 at 05:07:21PM +0100, Cedric - Equinoxe Media wrote: > I just had exactly the same crash again today : > /dev/sda4 on /filer type xfs (rw,noexec,nosuid,nodev,noatime) What did xfs_check tell you about the corruption? > Seems to be again on a setattr() ? Doing a truncation freeing some blocks. What is the client doing (i.e. io patterns, application, etc) to cause this? can you reproduce it without NFS being used? To track this down I'm going to need a reproducable test case.... Seeing this is a brand new server, have you run and soak or stress test on the raw storage to confirm it is error free? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Tue Nov 6 14:38:42 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 06 Nov 2007 14:38:48 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_42 autolearn=no version=3.3.0-r574664 Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA6McbiH026277 for ; Tue, 6 Nov 2007 14:38:41 -0800 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 727B492C5AC; Wed, 7 Nov 2007 09:38:41 +1100 (EST) Subject: Re: TAKE 972756 - Implement fallocate. From: Nathan Scott Reply-To: nscott@aconex.com To: Bhagi rathi , David Chinner Cc: xfs@oss.sgi.com In-Reply-To: <20071106204100.GW995458@sgi.com> References: <20071102024314.9BF3458C38F7@chook.melbourne.sgi.com> <20071106001223.GY66820511@sgi.com> <20071106204100.GW995458@sgi.com> Content-Type: text/plain Organization: Aconex Date: Wed, 07 Nov 2007 09:38:53 +1100 Message-Id: <1194388733.3862.206.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4684/Tue Nov 6 11:09:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13577 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: xfs On Wed, 2007-11-07 at 07:41 +1100, David Chinner wrote: > > Preallocation happened from 1k to 256k. Now, it looks to me that we > have > > un-written extents from 4k to 256k. There is no guarantee that data > from 1k > > to 4k is all zero'es. That guarantee does exist - when the initial 1K block write is done, the end of the block is zeroed (by the kernel write path). This is always done (guaranteed) and is required independently to unwritten extents. cheers. -- Nathan From owner-xfs@oss.sgi.com Tue Nov 6 16:16:13 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 06 Nov 2007 16:16:19 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: **** X-Spam-Status: No, score=4.0 required=5.0 tests=BAYES_99 autolearn=no version=3.3.0-r574664 Received: from atlas.kreativmedia.ch (ns23.kreativmedia.ch [80.74.146.167]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA70G8hF005441 for ; Tue, 6 Nov 2007 16:16:12 -0800 Received: (qmail 31423 invoked by uid 0); 7 Nov 2007 00:49:32 +0100 Date: 7 Nov 2007 00:49:32 +0100 Message-ID: <20071106234932.31422.qmail@atlas.kreativmedia.ch> From: cji@mdpi.org To: linux-xfs@oss.sgi.com MIME-Version: 1.0 Subject: =?utf-8?Q?Re:_Delivery_reports_about_your_e=2Dmail?= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Content-Disposition: inline X-Virus-Scanned: ClamAV 0.91.2/4685/Tue Nov 6 13:58:19 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13578 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: cji@mdpi.org Precedence: bulk X-list: xfs Dear Colleague, Thank you very much for your e-mail. Please note that the Chemical Journal on Internet (CJI, ISSN 1523-1623) is no longer published by Molecular Diversity Preservation International (MDPI). Please send your message to cji@chemistrymag.org or visit the journals website at http://www.chemistrymag.org/. Best regards, Dr. Shu-Kun Lin Publisher MDPI -- MDPI Center Matthaeusstrasse 11 CH-4057 Basel Switzerland Tel. +41 61 683 77 34 (office) Fax +41 61 302 89 18 From owner-xfs@oss.sgi.com Tue Nov 6 20:58:45 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 06 Nov 2007 20:58:47 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=0.2 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from rv-out-0910.google.com (rv-out-0910.google.com [209.85.198.186]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA74wgQ4021606 for ; Tue, 6 Nov 2007 20:58:44 -0800 Received: by rv-out-0910.google.com with SMTP id k20so1515943rvb for ; Tue, 06 Nov 2007 20:58:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; bh=/OCOo4yLX3XBaDQTzpW04ma0EgRKZRo8JYDfB4wlBIg=; b=Uv0CbHorBtp086Jl0VQEZcTQn7fSY3ejCkIGrfHG5P9bN+lKV5+3fG55oNQQdO4IgBz5rwLtFvmyA1Q6SdmwD+p/7+jYnJDiJAFaDDOmKsydFTfyAs1Rhs3dzpJo0NWDWU/t+qSIfvtmwuiFcsPZCND7RcYLIIs6hkXZ6o8Oh/s= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=gsXjC/lFtz1weSFvus2RAqiR1dAFuxF/d64BtQq2FnNdTyZqtERr40PfcHALH4fU6MYA9WxmSdjbCbDhVLsGbuEUdSyNv08jFEkaUICNQqA9zTIaqBk3ZiE8OAY/vC9otadD6OVw6OGBoh3zshX7d3Uk9uhsWscvavuOoSfUbqE= Received: by 10.115.88.1 with SMTP id q1mr1689915wal.1194409932422; Tue, 06 Nov 2007 20:32:12 -0800 (PST) Received: by 10.115.88.8 with HTTP; Tue, 6 Nov 2007 20:32:12 -0800 (PST) Message-ID: Date: Wed, 7 Nov 2007 10:02:12 +0530 From: "Manoj Kumar Pradhan" To: xfs@oss.sgi.com Subject: Deviation from XSDM in DM_EVENT_XXX MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Virus-Scanned: ClamAV 0.91.2/4689/Tue Nov 6 20:23:47 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13579 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: manojkp80@gmail.com Precedence: bulk X-list: xfs Hi, Can someone tell me why XFS-DMAPI deviates in the enum DM_EVEN_XXX from the standard? Thanks, Manoj From owner-xfs@oss.sgi.com Tue Nov 6 21:18:59 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 06 Nov 2007 21:19:05 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA75Isf5024386 for ; Tue, 6 Nov 2007 21:18:57 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA20497; Wed, 7 Nov 2007 16:18:52 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lA75IpdD96666406; Wed, 7 Nov 2007 16:18:52 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lA75ImNS96065424; Wed, 7 Nov 2007 16:18:48 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Wed, 7 Nov 2007 16:18:48 +1100 From: David Chinner To: "Robert P. J. Day" Cc: xfs@oss.sgi.com Subject: Re: use is_power_of_2() macro? Message-ID: <20071107051848.GI995458@sgi.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4689/Tue Nov 6 20:23:47 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13580 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Tue, Nov 06, 2007 at 10:28:44AM -0500, Robert P. J. Day wrote: > > given this in fs/xfs/xfs_inode.c: > > /* > * xfs_iroundup: round up argument to next power of two > */ > uint > xfs_iroundup( > uint v) > { > int i; > uint m; > > if ((v & (v - 1)) == 0) > return v; > ASSERT((v & 0x80000000) == 0); > if ((v & (v + 1)) == 0) > return v + 1; > for (i = 0, m = 1; i < 31; i++, m <<= 1) { > if (v & m) > continue; > v |= m; > if ((v & (v + 1)) == 0) > return v + 1; > } > ASSERT(0); > return( 0 ); > } > > is there any reason that can't be rewritten with simply > roundup_pow_of_two() as defined in include/linux/log2.h? > > #define roundup_pow_of_two(n) \ > ( \ > __builtin_constant_p(n) ? ( \ > (n == 1) ? 1 : \ > (1UL << (ilog2((n) - 1) + 1)) \ > ) : \ > __roundup_pow_of_two(n) \ > ) > > just curious. No - patch please. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Tue Nov 6 21:42:26 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 06 Nov 2007 21:42:36 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE, J_CHICKENPOX_42 autolearn=no version=3.3.0-r574664 Received: from nz-out-0506.google.com (nz-out-0506.google.com [64.233.162.224]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA75gOGb026653 for ; Tue, 6 Nov 2007 21:42:25 -0800 Received: by nz-out-0506.google.com with SMTP id x3so1391256nzd for ; Tue, 06 Nov 2007 21:42:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; bh=wlZWvUDl1yZ4aFk5SmE5QW4tW1UDfVwiNd9J8RN7mTk=; b=eIjCB0Sv9TOiztUUbEL+PLfvTO7vL0rx+zc4sVLea302OlHwV8CA6WZS1onJf84Ir9crzPfMxA5MT0IZ4RYRRsr0zSCRyEsE9Uar+1woLJhbvPp/M3mZrWdd3rr7BgelMclrBcFVqasYjQsQj3BhMzsSKKvhDzCWoH+7udMt0Y4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; b=Fz0P8MWta04jnXP4EppGxDEBGmidyZt3FHCEfzJb91Q8aKHRMlwFqcmwOBWJi4X9q+TQh4nK80ylYBqJZ0ppCuErlfveYeDeaT2V4qXYJm+YZBiNxb0HgmOwwXx/IN8QTVBAuTdZ+NZtq7hw42Dn/k7TobcaPl3r/m0xhWQDeok= Received: by 10.142.213.9 with SMTP id l9mr1911578wfg.1194414148592; Tue, 06 Nov 2007 21:42:28 -0800 (PST) Received: by 10.142.162.19 with HTTP; Tue, 6 Nov 2007 21:42:28 -0800 (PST) Message-ID: Date: Wed, 7 Nov 2007 11:12:28 +0530 From: "Bhagi rathi" To: nscott@aconex.com Subject: Re: TAKE 972756 - Implement fallocate. Cc: "David Chinner" , xfs@oss.sgi.com In-Reply-To: <1194388733.3862.206.camel@edge.yarra.acx> MIME-Version: 1.0 References: <20071102024314.9BF3458C38F7@chook.melbourne.sgi.com> <20071106001223.GY66820511@sgi.com> <20071106204100.GW995458@sgi.com> <1194388733.3862.206.camel@edge.yarra.acx> X-Virus-Scanned: ClamAV 0.91.2/4689/Tue Nov 6 20:23:47 2007 on oss.sgi.com X-Virus-Status: Clean Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: 7bit Content-length: 1345 X-archive-position: 13581 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jahnu77@gmail.com Precedence: bulk X-list: xfs Since size log change and data I/O are not binded, it is always possible that size can reach to the disk before I/O reaching to the disk. Also, the other problem is because of speculative allocation. A write-back allocation can leady to allocation of delayed extents into real and gets pruned only close of the file. Before that, we get fallocate, it allocates the exents, but the extents residing because of delayed allocation write-back will not have zero'ed content. Conceptually, fallocate if it intends to change size, it is no way different from size extending write. We do xfs_zero_eof for write and not in this case. Probably, I am missing the context of usage of fallocate if it has some semantics over-loaded. -Thanks, Bhagi. On 11/7/07, Nathan Scott wrote: > > On Wed, 2007-11-07 at 07:41 +1100, David Chinner wrote: > > > Preallocation happened from 1k to 256k. Now, it looks to me that we > > have > > > un-written extents from 4k to 256k. There is no guarantee that data > > from 1k > > > to 4k is all zero'es. > > That guarantee does exist - when the initial 1K block write is done, the > end of the block is zeroed (by the kernel write path). This is always > done (guaranteed) and is required independently to unwritten extents. > > cheers. > > -- > Nathan > > [[HTML alternate version deleted]] From owner-xfs@oss.sgi.com Tue Nov 6 23:36:26 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 06 Nov 2007 23:36:32 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from astoria.ccjclearline.com (astoria.ccjclearline.com [64.235.106.9]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA77aMvs007362 for ; Tue, 6 Nov 2007 23:36:26 -0800 Received: from [99.236.101.138] (helo=crashcourse.ca) by astoria.ccjclearline.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.68) (envelope-from ) id 1IpfSd-0005EZ-28; Wed, 07 Nov 2007 02:36:27 -0500 Date: Wed, 7 Nov 2007 02:34:36 -0500 (EST) From: "Robert P. J. Day" X-X-Sender: rpjday@localhost.localdomain To: xfs@oss.sgi.com cc: dgc@sgi.com Subject: [PATCH] XFS: Use kernel-supplied "roundup_pow_of_two" for simplicity. Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - astoria.ccjclearline.com X-AntiAbuse: Original Domain - oss.sgi.com X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - crashcourse.ca X-Source: X-Source-Args: X-Source-Dir: X-Virus-Scanned: ClamAV 0.91.2/4691/Tue Nov 6 21:39:41 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13582 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: rpjday@crashcourse.ca Precedence: bulk X-list: xfs Signed-off-by: Robert P. J. Day --- compile-tested on i386. fs/xfs/xfs_inode.c | 32 ++++---------------------------- fs/xfs/xfs_inode.h | 1 - 2 files changed, 4 insertions(+), 29 deletions(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index abf509a..bcc3d27 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -15,6 +15,8 @@ * along with this program; if not, write the Free Software Foundation, * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ +#include + #include "xfs.h" #include "xfs_fs.h" #include "xfs_types.h" @@ -3672,32 +3674,6 @@ xfs_iaccess( return XFS_ERROR(EACCES); } -/* - * xfs_iroundup: round up argument to next power of two - */ -uint -xfs_iroundup( - uint v) -{ - int i; - uint m; - - if ((v & (v - 1)) == 0) - return v; - ASSERT((v & 0x80000000) == 0); - if ((v & (v + 1)) == 0) - return v + 1; - for (i = 0, m = 1; i < 31; i++, m <<= 1) { - if (v & m) - continue; - v |= m; - if ((v & (v + 1)) == 0) - return v + 1; - } - ASSERT(0); - return( 0 ); -} - #ifdef XFS_ILOCK_TRACE ktrace_t *xfs_ilock_trace_buf; @@ -4204,7 +4180,7 @@ xfs_iext_realloc_direct( return; } if (!is_power_of_2(new_size)){ - rnew_size = xfs_iroundup(new_size); + rnew_size = roundup_pow_of_two(new_size); } if (rnew_size != ifp->if_real_bytes) { ifp->if_u1.if_extents = @@ -4227,7 +4203,7 @@ xfs_iext_realloc_direct( else { new_size += ifp->if_bytes; if (!is_power_of_2(new_size)) { - rnew_size = xfs_iroundup(new_size); + rnew_size = roundup_pow_of_two(new_size); } xfs_iext_inline_to_direct(ifp, rnew_size); } diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index e5aff92..e3a552e 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -568,7 +568,6 @@ int xfs_iextents_copy(xfs_inode_t *, xfs_bmbt_rec_t *, int); int xfs_iflush(xfs_inode_t *, uint); void xfs_iflush_all(struct xfs_mount *); int xfs_iaccess(xfs_inode_t *, mode_t, cred_t *); -uint xfs_iroundup(uint); void xfs_ichgtime(xfs_inode_t *, int); xfs_fsize_t xfs_file_last_byte(xfs_inode_t *); void xfs_lock_inodes(xfs_inode_t **, int, int, uint); -- ======================================================================== Robert P. J. Day Linux Consulting, Training and Annoying Kernel Pedantry Waterloo, Ontario, CANADA http://crashcourse.ca ======================================================================== From owner-xfs@oss.sgi.com Wed Nov 7 01:34:52 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 07 Nov 2007 01:34:56 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_42 autolearn=no version=3.3.0-r574664 Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA79YoO4025749 for ; Wed, 7 Nov 2007 01:34:52 -0800 Received: from mail.aconex.com (castle.yarra.acx [192.168.3.3]) by postoffice.aconex.com (Postfix) with ESMTP id 2168D92C8E4; Wed, 7 Nov 2007 20:34:56 +1100 (EST) Received: from 192.168.3.1 (proxying for 58.107.42.33) (SquirrelMail authenticated user nscott) by mail.aconex.com with HTTP; Wed, 7 Nov 2007 20:35:21 +1100 (EST) Message-ID: <56697.192.168.3.1.1194428121.squirrel@mail.aconex.com> In-Reply-To: References: <20071102024314.9BF3458C38F7@chook.melbourne.sgi.com> <20071106001223.GY66820511@sgi.com> <20071106204100.GW995458@sgi.com> <1194388733.3862.206.camel@edge.yarra.acx> Date: Wed, 7 Nov 2007 20:35:21 +1100 (EST) Subject: Re: TAKE 972756 - Implement fallocate. From: nscott@aconex.com To: "Bhagi rathi" Cc: "David Chinner" , xfs@oss.sgi.com User-Agent: SquirrelMail/1.4.8-4.el4.centos MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal X-Virus-Scanned: ClamAV 0.91.2/4691/Tue Nov 6 21:39:41 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13583 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: xfs > Since size log change and data I/O are not binded, it is always possible > that size can reach to the > disk before I/O reaching to the disk. Not clear what that has to do with whether partial blocks are zeroed or not? Can you give a specific series of steps that would demonstrate a problem? (preferably with a test case) > Also, the other problem is because > of > speculative allocation. > A write-back allocation can leady to allocation of delayed extents into > real > and gets pruned only > close of the file. > Before that, we get fallocate, it allocates the exents, > but the extents residing > because of delayed allocation write-back will not have zero'ed content. Again, I think a test case demonstrating the problem would go a long way to helping explain the issue. The preallocation code and ioctl interface have been in XFS forever on Linux - are you reporting problems you've actually observed here, or are these rather "potential issues" that you foresee from code analysis? cheers. -- Nathan From owner-xfs@oss.sgi.com Wed Nov 7 01:55:37 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 07 Nov 2007 01:55:42 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA79tXg3028455 for ; Wed, 7 Nov 2007 01:55:35 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id UAA25829; Wed, 7 Nov 2007 20:55:33 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lA79tWdD96463197; Wed, 7 Nov 2007 20:55:33 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lA79tUun92134906; Wed, 7 Nov 2007 20:55:30 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Wed, 7 Nov 2007 20:55:30 +1100 From: David Chinner To: Manoj Kumar Pradhan Cc: xfs@oss.sgi.com Subject: Re: Deviation from XSDM in DM_EVENT_XXX Message-ID: <20071107095530.GJ995458@sgi.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4691/Tue Nov 6 21:39:41 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13584 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Wed, Nov 07, 2007 at 10:02:12AM +0530, Manoj Kumar Pradhan wrote: > Hi, > > Can someone tell me why XFS-DMAPI deviates in the enum DM_EVEN_XXX > from the standard? From the spec: (http://www.opengroup.org/onlinepubs/9657099/chap4.htm) " dm_eventtype_t REQUIREMENT This enumeration must contain at least the elements listed here. The DMAPI implementation may choose a different order for the elements. " So as long as we have the events defined, it doesn't matter what their value or order in the enum is. It's a very rubbery spec.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Wed Nov 7 02:57:54 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 07 Nov 2007 02:57:58 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from spike.grumly.eu.org (spike.grumly.eu.org [195.5.253.226]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA7AvoJU004674 for ; Wed, 7 Nov 2007 02:57:54 -0800 Received: by spike.grumly.eu.org (Postfix, from userid 1001) id 528F911935; Wed, 7 Nov 2007 11:58:14 +0100 (CET) Date: Wed, 7 Nov 2007 11:58:14 +0100 From: Cedric - Equinoxe Media To: David Chinner Cc: xfs@oss.sgi.com Subject: Re: xfs crash Message-ID: <20071107105814.GD25295@e-m.fr> References: <20071105215135.GA12238@e-m.fr> <20071106082632.GU995458@sgi.com> <20071106092157.GB16694@e-m.fr> <20071106160721.GB25295@e-m.fr> <20071106205556.GZ995458@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20071106205556.GZ995458@sgi.com> X-Virus-Scanned: ClamAV 0.91.2/4691/Tue Nov 6 21:39:41 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13585 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: cedric@e-m.fr Precedence: bulk X-list: xfs On 07/11/2007 07:55, David Chinner wrote: > On Tue, Nov 06, 2007 at 05:07:21PM +0100, Cedric - Equinoxe Media wrote: > > I just had exactly the same crash again today : > > /dev/sda4 on /filer type xfs (rw,noexec,nosuid,nodev,noatime) > > What did xfs_check tell you about the corruption? I had quite the same message as last time, forgot to copy/paste... > > Seems to be again on a setattr() ? > > Doing a truncation freeing some blocks. > > What is the client doing (i.e. io patterns, application, etc) to > cause this? can you reproduce it without NFS being used? To track > this down I'm going to need a reproducable test case.... It is 5 web servers with php as nfsv3 clients. > Seeing this is a brand new server, have you run and soak or stress > test on the raw storage to confirm it is error free? I have run bonnie++ and memtest86+ with no errors. I am now trying to recompile linux without nfsv4, ACL and all experimental features of nfs and xfs. -- Cédric Tabary Ingénieur réseau - Equinoxe Media +33 (0)6 77 45 80 15 From owner-xfs@oss.sgi.com Wed Nov 7 16:38:51 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 07 Nov 2007 16:39:18 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA80cmTx011409 for ; Wed, 7 Nov 2007 16:38:50 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA17067; Thu, 8 Nov 2007 11:38:52 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lA80cpdD95857910; Thu, 8 Nov 2007 11:38:51 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lA80cmCg97600867; Thu, 8 Nov 2007 11:38:48 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 8 Nov 2007 11:38:48 +1100 From: David Chinner To: xfs-oss Cc: xfs-dev Subject: [patch] Fix broken inode clustering Message-ID: <20071108003848.GA66820511@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4694/Wed Nov 7 10:55:51 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13586 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs The radix tree based inode caches did away with the inode cluster hashes, replacing them with a bunch of masking and gang lookups on the radix tree. This masking got broken when moving the code to per-ag radix trees and indexing by agino # rather than straight inode number. The result is clustered inode writeback does not cluster and things can go extremely slowly when there are lots of inodes to write. The following patch fixes this up by comparing agino # of the inode found to the index of the cluster we are looking for. Signed-off-by: Dave Chinner Tested-by: Torsten Kaiser --- fs/xfs/xfs_iget.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: 2.6.x-xfs-new/fs/xfs/xfs_iget.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_iget.c 2007-11-02 13:44:46.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_iget.c 2007-11-07 13:08:42.534440675 +1100 @@ -248,7 +248,7 @@ finish_inode: icl = NULL; if (radix_tree_gang_lookup(&pag->pag_ici_root, (void**)&iq, first_index, 1)) { - if ((iq->i_ino & mask) == first_index) + if ((XFS_INO_TO_AGINO(mp, iq->i_ino) & mask) == first_index) icl = iq->i_cluster; } From owner-xfs@oss.sgi.com Wed Nov 7 18:34:38 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 07 Nov 2007 18:34:46 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=0.4 required=5.0 tests=ANY_BOUNCE_MESSAGE,AWL, BAYES_50,VBOUNCE_MESSAGE autolearn=no version=3.3.0-r574664 Received: from omr-m23.mx.aol.com (omr-m23.mx.aol.com [64.12.136.131]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA82YZ1P022538 for ; Wed, 7 Nov 2007 18:34:38 -0800 Received: from rly-mc05.mail.aol.com (rly-mc05.mail.aol.com [172.20.118.147]) by omr-m23.mx.aol.com (v117.7) with ESMTP id MAILOMRM234-7dff473275c0357; Wed, 07 Nov 2007 21:34:40 -0400 Received: from localhost (localhost) by rly-mc05.mail.aol.com (8.8.8/8.8.8/AOL-5.0.0) with internal id VAA21506; Wed, 7 Nov 2007 21:34:40 -0500 (EST) Date: Wed, 7 Nov 2007 21:34:40 -0500 (EST) From: Mail Delivery Subsystem Message-Id: <200711080234.VAA21506@rly-mc05.mail.aol.com> To: MIME-Version: 1.0 Content-Type: multipart/report; report-type=delivery-status; boundary="VAA21506.1194489280/rly-mc05.mail.aol.com" Subject: Returned mail: Service unavailable Auto-Submitted: auto-generated (failure) X-AOL-INRLY: host121.sleepys.com [65.200.161.121] rly-mc05 X-AOL-IP: 172.20.118.147 X-Virus-Scanned: ClamAV 0.91.2/4695/Wed Nov 7 16:08:56 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13587 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: MAILER-DAEMON@aol.com Precedence: bulk X-list: xfs This is a MIME-encapsulated message --VAA21506.1194489280/rly-mc05.mail.aol.com The original message was received at Wed, 7 Nov 2007 21:34:22 -0500 (EST) from host121.sleepys.com [65.200.161.121] *** ATTENTION *** Your e-mail is being returned to you because there was a problem with its delivery. The address which was undeliverable is listed in the section labeled: "----- The following addresses had permanent fatal errors -----". The reason your mail is being returned to you is listed in the section labeled: "----- Transcript of Session Follows -----". The line beginning with "<<<" describes the specific reason your e-mail could not be delivered. The next line contains a second error message which is a general translation for other e-mail servers. Please direct further questions regarding this message to your e-mail administrator. --AOL Postmaster ----- The following addresses had permanent fatal errors ----- ----- Transcript of session follows ----- ... while talking to air-mc04.mail.aol.com.: >>> DATA <<< 554 TRANSACTION FAILED - Unrepairable Virus Detected. Your mail has not been sent. 554 ... Service unavailable --VAA21506.1194489280/rly-mc05.mail.aol.com Content-Type: message/delivery-status Reporting-MTA: dns; rly-mc05.mail.aol.com Arrival-Date: Wed, 7 Nov 2007 21:34:22 -0500 (EST) Final-Recipient: RFC822; rsexymama1965@aol.com Action: failed Status: 5.0.0 Remote-MTA: DNS; air-mc04.mail.aol.com Diagnostic-Code: SMTP; 554 TRANSACTION FAILED - Unrepairable Virus Detected. Your mail has not been sent. Last-Attempt-Date: Wed, 7 Nov 2007 21:34:40 -0500 (EST) --VAA21506.1194489280/rly-mc05.mail.aol.com Content-Type: text/rfc822-headers Received: from oss.sgi.com (host121.sleepys.com [65.200.161.121]) by rly-mc05.mail.aol.com (v120.9) with ESMTP id MAILRELAYINMC510-12c473275ad1f; Wed, 07 Nov 2007 21:34:21 -0400 From: linux-xfs@oss.sgi.com To: rsexymama1965@aol.com Subject: RETURNED MAIL: DATA FORMAT ERROR Date: Wed, 7 Nov 2007 21:34:21 -0500 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0006_AB6620E8.A6952C57" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2600.0000 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 X-AOL-IP: 65.200.161.121 X-AOL-SCOLL-SCORE: 0:2:268420784:9395240 X-AOL-SCOLL-URL_COUNT: X-AOL-SCOLL-AUTHENTICATION: listenair ; SPF_helo : X-AOL-SCOLL-AUTHENTICATION: listenair ; SPF_822_from : Message-ID: <200711072134.12c473275ad1f@rly-mc05.mail.aol.com> --VAA21506.1194489280/rly-mc05.mail.aol.com-- From owner-xfs@oss.sgi.com Wed Nov 7 19:12:55 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 07 Nov 2007 19:12:58 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA83Co1V026200 for ; Wed, 7 Nov 2007 19:12:53 -0800 Received: from timothy-shimmins-power-mac-g5.local (boing.melbourne.sgi.com [134.14.55.141]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA20497; Thu, 8 Nov 2007 14:12:49 +1100 Message-ID: <47327ED2.8060402@sgi.com> Date: Thu, 08 Nov 2007 14:13:22 +1100 From: Timothy Shimmin User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Roger Willcocks CC: xfs@oss.sgi.com Subject: Re: bug: truncate to zero + setuid References: <47249E7A.7060709@filmlight.ltd.uk> <47252F62.6030503@sgi.com> <47262CD0.5010708@filmlight.ltd.uk> <4726ADAE.9070206@sgi.com> <472769A1.5090605@filmlight.ltd.uk> <472A7940.5070800@sgi.com> <000001c81f3e$eff344b0$6501a8c0@BODDINGTON> In-Reply-To: <000001c81f3e$eff344b0$6501a8c0@BODDINGTON> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4699/Wed Nov 7 18:08:23 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13588 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: tes@sgi.com Precedence: bulk X-list: xfs Hi Roger, Roger Willcocks wrote: > Timothy Shimmin wrote: >> Hi Roger, >> > ... >> I don't like all these inconsistencies. > > Take a look at the attached patch relative to the current cvs (it's a > bit big to put > inline). The basic problem is it's currently unclear when to set the > times from > va_atime etc. and when to set them to the current time. So I've used the > already > defined XFS_AT_UPDxTIME flags to indicate that a time should be set to > 'now' > and XFS_AT_xTIME to mean set it using va_xtime. This seems to fit well with > the current code and I wonder if that's how it was meant to work in the > first > place. Yeah, I've looked at this a few times now ;-) and this _seems_ like a reasonable thing to do to me. So patch: ATTR_ATIME_SET => XFS_AT_ATIME (& set va_atime etc) (used to set to given time) ATTR_ATIME => XFS_AT_UPDATIME (used to set to "now") likewise for M variant. Previously: ATTR_ATIME_SET => ATTR_UTIME flag (used to set given time) must expect ATTR_ATIME to be set too to get va_atime ATTR_ATIME => XFS_AT_ATIME (& set va_atime) (used to set to "now") a bit confusing since it can store va_atime even if ATTR_ATIME_SET is not on > I've also removed the now redundant ATTR_UTIME flag and pulled > the null truncate to the top, which simplifies things. > So these changes of: if (mask & (XFS_AT_ATIME|XFS_AT_MTIME)) { if (!file_owner) { - if ((flags & ATTR_UTIME) && - !capable(CAP_FOWNER)) { + if (!capable(CAP_FOWNER)) { Where you take out ATTR_UTIME make sense since XFS_AT_ATIME et al, now refer to the case where a given time is provided instead of requiring ATTR_UTIME to be set. > One query: in both xfs_iops.c/xfs_vn_setattr and > xfs_dm.c/xfs_dm_set_fileattr the > ATIME branch sets the inode's atime directly. xfs_vn_setattr() if (ia_valid & ATTR_ATIME) { vattr.va_mask |= XFS_AT_ATIME; vattr.va_atime = attr->ia_atime; inode->i_atime = attr->ia_atime; } xfs_dm_set_fileattr() if (mask & DM_AT_ATIME) { vat.va_mask |= XFS_AT_ATIME; vat.va_atime.tv_sec = stat.fa_atime; vat.va_atime.tv_nsec = 0; inode->i_atime.tv_sec = stat.fa_atime; } Hmmm.... So this could change behavior for xfs_vn_setattr(). If previously we had ATTR_ATIME set but NOT ATTR_ATIME_SET, then we would set inode->i_atime. Now with the patch, in this case, we don't set inode->i_atime at this point. However, in this case we wouldn't want i_atime to be set to ia_atime as we would want it to be set to "now" in xfs_ichgtime(). > This is probably something > to do with > the comment above xfs_iops.c/xfs_ichgtime ('to make sure the access time > update > will take') but it could probably be handled better. > I'll need to look. >> BTW, your locking looks wrong - it appears you don't unlock when the >> file is non-zero size. > > Oops... > I was also thinking of a read lock here. And initializing quot vars to zero in variable definition at top. This stuff really needs to be QA'ed well. It would be too easy to get a regression in expected behavior. Need to hunt out qa tests. Thanks for the effort, Tim. From owner-xfs@oss.sgi.com Wed Nov 7 22:45:42 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 07 Nov 2007 22:45:45 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA86jaUx015322 for ; Wed, 7 Nov 2007 22:45:40 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA24077; Thu, 8 Nov 2007 17:45:34 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lA86jXdD97868567; Thu, 8 Nov 2007 17:45:33 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lA86jUrZ97942740; Thu, 8 Nov 2007 17:45:30 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 8 Nov 2007 17:45:30 +1100 From: David Chinner To: "Robert P. J. Day" Cc: xfs@oss.sgi.com, dgc@sgi.com Subject: Re: [PATCH] XFS: Use kernel-supplied "roundup_pow_of_two" for simplicity. Message-ID: <20071108064530.GE66820511@sgi.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4703/Wed Nov 7 20:19:56 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13589 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Wed, Nov 07, 2007 at 02:34:36AM -0500, Robert P. J. Day wrote: > > Signed-off-by: Robert P. J. Day > > --- > > compile-tested on i386. Thanks. I'll QA it and queue it up for .25. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Thu Nov 8 03:27:12 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 08 Nov 2007 03:27:16 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from waldorf.loreland.org (uk.loreland.org [89.16.172.112]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA8BR7Ub026046 for ; Thu, 8 Nov 2007 03:27:12 -0800 Received: by waldorf.loreland.org (Postfix, from userid 33) id B00BC200AA; Thu, 8 Nov 2007 11:27:09 +0000 (GMT) To: xfs@oss.sgi.com Subject: =?UTF-8?Q?xfs=5Frepair=20=32=2E=39=2E=34=20threading/progress=20info=20mi?= =?UTF-8?Q?ssing=3F?= MIME-Version: 1.0 Date: Thu, 8 Nov 2007 11:27:09 +0000 From: James Braid Message-ID: X-Sender: jamesb@loreland.org User-Agent: RoundCube Webmail/0.1-rc1 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV 0.91.2/4708/Wed Nov 7 22:07:54 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13590 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jamesb@loreland.org Precedence: bulk X-list: xfs I just upgraded to xfsprogs 2.9.4 and it seems the multi-threading and progress information is no longer being reported? 2.8.18: - creating 8 worker thread(s) Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... 2.9.4: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Have I just mis-compiled something? The progress information in particular was REALLY useful on our bigger filesystems. From owner-xfs@oss.sgi.com Thu Nov 8 05:13:50 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 08 Nov 2007 05:13:53 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00, SUBJECT_FUZZY_TION autolearn=no version=3.3.0-r574664 Received: from r2d2.neofacto.lu (mail.neofacto.lu [158.64.60.195]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA8DDlAj009037 for ; Thu, 8 Nov 2007 05:13:50 -0800 Received: from localhost (localhost [127.0.0.1]) by r2d2.neofacto.lu (Postfix) with ESMTP id 8BE67C2F95; Thu, 8 Nov 2007 14:13:51 +0100 (CET) X-Virus-Scanned: ClamAV 0.91.2/4708/Wed Nov 7 22:07:54 2007 on oss.sgi.com X-Virus-Scanned: Ubuntu amavisd-new at neofacto.lu Received: from r2d2.neofacto.lu ([127.0.0.1]) by localhost (r2d2.neofacto.lu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8WEtiOV3IGPu; Thu, 8 Nov 2007 14:13:37 +0100 (CET) Received: by r2d2.neofacto.lu (Postfix, from userid 65534) id B8116C2F99; Thu, 8 Nov 2007 14:13:37 +0100 (CET) Received: from [192.168.1.166] (SU105.tudor.lu [158.64.4.205]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by r2d2.neofacto.lu (Postfix) with ESMTP id 10BC1C2F95; Thu, 8 Nov 2007 14:13:28 +0100 (CET) Message-ID: <47330B8E.5010008@jamendo.com> Date: Thu, 08 Nov 2007 14:13:50 +0100 From: Amandine AUPETIT User-Agent: Thunderbird 2.0.0.6 (X11/20071022) MIME-Version: 1.0 To: Eric Sandeen CC: Joshua Baker-LePain , xfs@oss.sgi.com Subject: Re: 7Tb XFS partition lost on reboot References: <473072FD.4070104@jamendo.com> <4730B98C.5090008@sandeen.net> In-Reply-To: <4730B98C.5090008@sandeen.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Status: Clean X-archive-position: 13591 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: amandine@jamendo.com Precedence: bulk X-list: xfs Hi, Thanks for the advice ! I checked the label with xfs_admin -l : # xfs_admin -l /dev/cciss/c1d0p1 label = "" I tried to blank it in case there is something invisible : # xfs_admin -L -- /dev/cciss/c1d0p1 writing all SBs new label = "" But it seems to be the same. :( Amandine Eric Sandeen a écrit : > Joshua Baker-LePain wrote: > >> On Tue, 6 Nov 2007 at 2:58pm, Amandine AUPETIT wrote >> >> >>> So I created the partition with parted, because fdisk can't do more that 2tb >>> partitions. >>> It's ok, I can do what I want but... >>> >>> on reboot, there is a Superblock problem, something like that. When I check >>> with xfs_check : >>> >> First guess -- did you use a gpt disklabel on that device? Standard >> (msdos) disklabels don't work on devices >2TB. The usual symptom of a big >> device with an msdos disklabel is that the partition table goes away on >> reboot. >> > > I second that hunch. :) > > -Eric > From owner-xfs@oss.sgi.com Thu Nov 8 06:41:33 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 08 Nov 2007 06:41:38 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS, SUBJECT_FUZZY_TION autolearn=no version=3.3.0-r574664 Received: from sandeen.net (sandeen.net [209.173.210.139]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA8EfSDM021423 for ; Thu, 8 Nov 2007 06:41:32 -0800 Received: from liberator.sandeen.net (liberator.sandeen.net [10.0.0.4]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by sandeen.net (Postfix) with ESMTP id 26C8D18008612; Thu, 8 Nov 2007 08:41:33 -0600 (CST) Message-ID: <4733201C.9060802@sandeen.net> Date: Thu, 08 Nov 2007 08:41:32 -0600 From: Eric Sandeen User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Amandine AUPETIT CC: Joshua Baker-LePain , xfs@oss.sgi.com Subject: Re: 7Tb XFS partition lost on reboot References: <473072FD.4070104@jamendo.com> <4730B98C.5090008@sandeen.net> <47330B8E.5010008@jamendo.com> In-Reply-To: <47330B8E.5010008@jamendo.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4708/Wed Nov 7 22:07:54 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13592 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: sandeen@sandeen.net Precedence: bulk X-list: xfs Amandine AUPETIT wrote: > Hi, > Thanks for the advice ! > I checked the label with xfs_admin -l : > # xfs_admin -l /dev/cciss/c1d0p1 > label = "" > > I tried to blank it in case there is something invisible : > # xfs_admin -L -- /dev/cciss/c1d0p1 > writing all SBs > new label = "" > > But it seems to be the same. :( No, not the filesystem label, the disklabel, otherwise known as the partition table - dos vs. gpt. This is something you set with parted or fdisk, not xfs_admin. -Eric From owner-xfs@oss.sgi.com Thu Nov 8 06:59:36 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 08 Nov 2007 06:59:39 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-0.9 required=5.0 tests=AWL,BAYES_40, SUBJECT_FUZZY_TION autolearn=no version=3.3.0-r574664 Received: from r2d2.neofacto.lu (mail.neofacto.lu [158.64.60.195]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA8ExWfY024308 for ; Thu, 8 Nov 2007 06:59:35 -0800 Received: from localhost (localhost [127.0.0.1]) by r2d2.neofacto.lu (Postfix) with ESMTP id B37FFC2FC7; Thu, 8 Nov 2007 15:59:37 +0100 (CET) X-Virus-Scanned: ClamAV 0.91.2/4708/Wed Nov 7 22:07:54 2007 on oss.sgi.com X-Virus-Scanned: Ubuntu amavisd-new at neofacto.lu Received: from r2d2.neofacto.lu ([127.0.0.1]) by localhost (r2d2.neofacto.lu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iBCyDkDgGrAb; Thu, 8 Nov 2007 15:59:22 +0100 (CET) Received: by r2d2.neofacto.lu (Postfix, from userid 65534) id 4A3BDC2FC8; Thu, 8 Nov 2007 15:59:22 +0100 (CET) Received: from [192.168.1.166] (SU105.tudor.lu [158.64.4.205]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by r2d2.neofacto.lu (Postfix) with ESMTP id 6BAE9C2FC7; Thu, 8 Nov 2007 15:59:12 +0100 (CET) Message-ID: <47332456.9030805@jamendo.com> Date: Thu, 08 Nov 2007 15:59:34 +0100 From: Amandine AUPETIT User-Agent: Thunderbird 2.0.0.6 (X11/20071022) MIME-Version: 1.0 To: Eric Sandeen CC: Joshua Baker-LePain , xfs@oss.sgi.com Subject: Re: 7Tb XFS partition lost on reboot References: <473072FD.4070104@jamendo.com> <4730B98C.5090008@sandeen.net> <47330B8E.5010008@jamendo.com> <4733201C.9060802@sandeen.net> In-Reply-To: <4733201C.9060802@sandeen.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Status: Clean X-archive-position: 13593 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: amandine@jamendo.com Precedence: bulk X-list: xfs Ok, sorry for the mix-up. You were right, the problem was about the partition table. Actually, even if it was a gpt partition table, it was corrupted. I seems to be a ubuntu parted problem. I had to rebuild pared from the sources and when I ran it, it said : # ./parted GNU Parted 1.8.8 Using /dev/cciss/c1d0 Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) print print Warning: /dev/cciss/c1d0 contains GPT signatures, indicating that it has a GPT table. However, it does not have a valid fake msdos partition table, as it should. Perhaps it was corrupted -- possibly by a program that doesn't understand GPT partition tables. Or perhaps you deleted the GPT table, and are now using an msdos partition table. Is this a GPT partition table? Yes/No? yes yes Model: Compaq Smart Array (cpqarray) Disk /dev/cciss/c1d0: 7501GB Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 17.4kB 7501GB 7501GB xfs primary So I deleted the existing partition and recreated a new one. I found back all my data on this new partition, and on reboot there is no problem anymore. Thanks a lot for your help ! :) Amandine Eric Sandeen a écrit : > Amandine AUPETIT wrote: > >> Hi, >> Thanks for the advice ! >> I checked the label with xfs_admin -l : >> # xfs_admin -l /dev/cciss/c1d0p1 >> label = "" >> >> I tried to blank it in case there is something invisible : >> # xfs_admin -L -- /dev/cciss/c1d0p1 >> writing all SBs >> new label = "" >> >> But it seems to be the same. :( >> > > No, not the filesystem label, the disklabel, otherwise known as the > partition table - dos vs. gpt. This is something you set with parted or > fdisk, not xfs_admin. > > -Eric > From owner-xfs@oss.sgi.com Thu Nov 8 07:14:31 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 08 Nov 2007 07:14:34 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS, SUBJECT_FUZZY_TION autolearn=no version=3.3.0-r574664 Received: from sandeen.net (sandeen.net [209.173.210.139]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA8FET4r026436 for ; Thu, 8 Nov 2007 07:14:30 -0800 Received: from liberator.sandeen.net (liberator.sandeen.net [10.0.0.4]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by sandeen.net (Postfix) with ESMTP id 3890C18008614; Thu, 8 Nov 2007 09:14:35 -0600 (CST) Message-ID: <473327DA.8070909@sandeen.net> Date: Thu, 08 Nov 2007 09:14:34 -0600 From: Eric Sandeen User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Amandine AUPETIT CC: Joshua Baker-LePain , xfs@oss.sgi.com Subject: Re: 7Tb XFS partition lost on reboot References: <473072FD.4070104@jamendo.com> <4730B98C.5090008@sandeen.net> <47330B8E.5010008@jamendo.com> <4733201C.9060802@sandeen.net> <47332456.9030805@jamendo.com> In-Reply-To: <47332456.9030805@jamendo.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4708/Wed Nov 7 22:07:54 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13594 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: sandeen@sandeen.net Precedence: bulk X-list: xfs Amandine AUPETIT wrote: > So I deleted the existing partition and recreated a new one. I found > back all my data on this new partition, and on reboot there is no > problem anymore. Wow, I love a happy ending, especially when it involves 7T of data ;-) -Eric From owner-xfs@oss.sgi.com Thu Nov 8 15:29:22 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 08 Nov 2007 15:29:26 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA8NTIRM001191 for ; Thu, 8 Nov 2007 15:29:21 -0800 Received: from pc-bnaujok.melbourne.sgi.com (pc-bnaujok.melbourne.sgi.com [134.14.55.58]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA16669; Fri, 9 Nov 2007 10:29:18 +1100 Date: Fri, 09 Nov 2007 10:29:28 +1100 To: "James Braid" , xfs@oss.sgi.com Subject: Re: xfs_repair 2.9.4 threading/progress info missing? From: "Barry Naujok" Organization: SGI Content-Type: text/plain; format=flowed; delsp=yes; charset=utf-8 MIME-Version: 1.0 References: Message-ID: In-Reply-To: User-Agent: Opera Mail/9.24 (Win32) X-Virus-Scanned: ClamAV 0.91.2/4715/Thu Nov 8 14:31:53 2007 on oss.sgi.com X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from Quoted-Printable to 8bit by oss.sgi.com id lA8NTMRM001213 X-archive-position: 13595 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bnaujok@sgi.com Precedence: bulk X-list: xfs On Thu, 08 Nov 2007 22:27:09 +1100, James Braid wrote: > > I just upgraded to xfsprogs 2.9.4 and it seems the multi-threading and > progress information is no longer being reported? > > 2.8.18: > - creating 8 worker thread(s) > Phase 1 - find and verify superblock... > - reporting progress in intervals of 15 minutes > Phase 2 - using internal log > - zero log... > - scan filesystem freespace and inode maps... > > 2.9.4: > Phase 1 - find and verify superblock... > Phase 2 - using internal log > - zero log... > - scan filesystem freespace and inode maps... > - found root inode chunk > > Have I just mis-compiled something? The progress information in > particular > was REALLY useful on our bigger filesystems. Hi, It is sort of still there. With the performance improvements in 2.9.4, the multithreading implementation was changed and hidden from the user. A side effect of this change was the progress info is now only visible when using the ag_stride option. Depending on your drive layout, ag_stride may speed up repair even further, especially concats. With other layouts, try "-o ag_stride=1". I found on one of our RAIDs, that actually doubled doubled the repair performance on a 5 way stripe. Progress info for normal runs will be reintroduced in the near future and other output will be cleaned up a tad. Regards, Barry. From owner-xfs@oss.sgi.com Thu Nov 8 16:34:04 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 08 Nov 2007 16:34:07 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_05,J_CHICKENPOX_62, J_CHICKENPOX_64 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA90XxAB017738 for ; Thu, 8 Nov 2007 16:34:01 -0800 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA18419; Fri, 9 Nov 2007 11:33:58 +1100 Message-ID: <4733AB27.70208@sgi.com> Date: Fri, 09 Nov 2007 11:34:47 +1100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.6 (X11/20070728) MIME-Version: 1.0 To: David Chinner CC: xfs-oss , xfs-dev Subject: Re: [PATCH, RFC] Move AIL pushing into a separate thread References: <20071105050706.GW66820511@sgi.com> In-Reply-To: <20071105050706.GW66820511@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4715/Thu Nov 8 14:31:53 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13596 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs I like the sound of this Dave. I'm still going through the code in detail. Could we convert the ail lock into a mutex to ease the load? I know it may not improve throughput but it would at least relieve the CPUs to do other stuff. David Chinner wrote: > When many hundreds to thousands of threads all try to do simultaneous > transactions and the log is in a tail-pushing situation (i.e. full), > we can get multiple threads walking the AIL list and contending on > the AIL lock. > > Recently wevve had two cases of machines basically locking up because > most of the CPUs in the system are trying to obtain the AIL lock. > The first was an 8p machine with ~2,500 kernel threads trying to > do transactions, and the latest is a 2048p altix closing a file per > MPI rank in a synchronised fashion resulting in > 400 processes > all trying to walk and push the AIL at the same time. > > The AIL push is, in effect, a simple I/O dispatch algorithm complicated > by the ordering constraints placed on it by the transaction subsystem. > It really does not need multiple threads to push on it - even when > only a single CPU is pushing the AIL, it can push the I/O out far faster > that pretty much any disk subsystem can handle. > > So, to avoid contention problems stemming from multiple list walkers, > move the list walk off into another thread and simply provide a "target" > to push to. When a thread requires a push, it sets the target and wakes > the push thread, then goes to sleep waiting for the required amount > of space to become available in the log. > > This mechanism should also be a lot fairer under heavy load as the > waiters will queue in arrival order, rather than queuing in "who completed > a push first" order. > > Also, by moving the pushing to a separate thread we can do more effectively > overload detection and prevention as we can keep context from loop iteration > to loop iteration. That is, we can push only part of the list each loop and not > have to loop back to the start of the list every time we run. This should > also help by reducing the number of items we try to lock and/or push items > that we cannot move. > > Note that this patch is not intended to solve the inefficiencies in the > AIL structure and the associated issues with extremely large list contents. > That needs to be addresses separately; parallel access would cause problems > to any new structure as well, so I'm only aiming to isolate the structure > from unbounded parallelism here. > > Signed-Off-By: Dave Chinner > --- > fs/xfs/linux-2.6/xfs_super.c | 60 +++++++++++ > fs/xfs/xfs_log.c | 12 ++ > fs/xfs/xfs_mount.c | 6 - > fs/xfs/xfs_mount.h | 10 + > fs/xfs/xfs_trans.h | 1 > fs/xfs/xfs_trans_ail.c | 231 ++++++++++++++++++++++++++++--------------- > fs/xfs/xfs_trans_priv.h | 8 + > fs/xfs/xfsidbg.c | 12 +- > 8 files changed, 247 insertions(+), 93 deletions(-) > > Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_super.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_super.c 2007-11-05 10:39:05.000000000 +1100 > +++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_super.c 2007-11-05 14:48:39.871177707 +1100 > @@ -51,6 +51,7 @@ > #include "xfs_vfsops.h" > #include "xfs_version.h" > #include "xfs_log_priv.h" > +#include "xfs_trans_priv.h" > > #include > #include > @@ -765,6 +766,65 @@ xfs_blkdev_issue_flush( > blkdev_issue_flush(buftarg->bt_bdev, NULL); > } > > +/* > + * XFS AIL push thread support > + */ > +void > +xfsaild_wakeup( > + xfs_mount_t *mp, > + xfs_lsn_t threshold_lsn) > +{ > + > + if (XFS_LSN_CMP(threshold_lsn, mp->m_ail.xa_target) > 0) { > + mp->m_ail.xa_target = threshold_lsn; > + wake_up_process(mp->m_ail.xa_task); > + } > +} > + > +int > +xfsaild( > + void *data) > +{ > + xfs_mount_t *mp = (xfs_mount_t *)data; > + xfs_lsn_t last_pushed_lsn = 0; > + long tout = 0; > + > + while (!kthread_should_stop()) { > + if (tout) > + schedule_timeout_interruptible(msecs_to_jiffies(tout)); > + > + /* swsusp */ > + try_to_freeze(); > + > + /* we're either starting or stopping if there is no log */ > + if (!mp->m_log) > + continue; > + > + tout = xfsaild_push(mp, &last_pushed_lsn); > + } > + > + return 0; > +} /* xfsaild */ > + > +void > +xfsaild_start( > + xfs_mount_t *mp) > +{ > + mp->m_ail.xa_target = 0; > + mp->m_ail.xa_task = kthread_run(xfsaild, mp, "xfsaild"); > + ASSERT(!IS_ERR(mp->m_ail.xa_task)); > + /* XXX: should return error but nowhere to do it */ > +} > + > +void > +xfsaild_stop( > + xfs_mount_t *mp) > +{ > + kthread_stop(mp->m_ail.xa_task); > +} > + > + > + > STATIC struct inode * > xfs_fs_alloc_inode( > struct super_block *sb) > Index: 2.6.x-xfs-new/fs/xfs/xfs_log.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_log.c 2007-11-02 18:00:19.000000000 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfs_log.c 2007-11-05 14:07:16.850189316 +1100 > @@ -515,6 +515,12 @@ xfs_log_mount(xfs_mount_t *mp, > mp->m_log = xlog_alloc_log(mp, log_target, blk_offset, num_bblks); > > /* > + * Initialize the AIL now we have a log. > + */ > + spin_lock_init(&mp->m_ail_lock); > + xfs_trans_ail_init(mp); > + > + /* > * skip log recovery on a norecovery mount. pretend it all > * just worked. > */ > @@ -530,7 +536,7 @@ xfs_log_mount(xfs_mount_t *mp, > mp->m_flags |= XFS_MOUNT_RDONLY; > if (error) { > cmn_err(CE_WARN, "XFS: log mount/recovery failed: error %d", error); > - xlog_dealloc_log(mp->m_log); > + xfs_log_unmount_dealloc(mp); > return error; > } > } > @@ -722,10 +728,14 @@ xfs_log_unmount_write(xfs_mount_t *mp) > > /* > * Deallocate log structures for unmount/relocation. > + * > + * We need to stop the aild from running before we destroy > + * and deallocate the log as the aild references the log. > */ > void > xfs_log_unmount_dealloc(xfs_mount_t *mp) > { > + xfs_trans_ail_destroy(mp); > xlog_dealloc_log(mp->m_log); > } > > Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.c 2007-11-02 13:44:50.000000000 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfs_mount.c 2007-11-05 14:12:22.554601173 +1100 > @@ -137,15 +137,9 @@ xfs_mount_init(void) > mp->m_flags |= XFS_MOUNT_NO_PERCPU_SB; > } > > - spin_lock_init(&mp->m_ail_lock); > spin_lock_init(&mp->m_sb_lock); > mutex_init(&mp->m_ilock); > mutex_init(&mp->m_growlock); > - /* > - * Initialize the AIL. > - */ > - xfs_trans_ail_init(mp); > - > atomic_set(&mp->m_active_trans, 0); > > return mp; > Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.h > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.h 2007-10-16 08:52:58.000000000 +1000 > +++ 2.6.x-xfs-new/fs/xfs/xfs_mount.h 2007-11-05 14:14:42.652456849 +1100 > @@ -219,12 +219,18 @@ extern void xfs_icsb_sync_counters_flags > #define xfs_icsb_sync_counters_flags(mp, flags) do { } while (0) > #endif > > +typedef struct xfs_ail { > + xfs_ail_entry_t xa_ail; > + uint xa_gen; > + struct task_struct *xa_task; > + xfs_lsn_t xa_target; > +} xfs_ail_t; > + > typedef struct xfs_mount { > struct super_block *m_super; > xfs_tid_t m_tid; /* next unused tid for fs */ > spinlock_t m_ail_lock; /* fs AIL mutex */ > - xfs_ail_entry_t m_ail; /* fs active log item list */ > - uint m_ail_gen; /* fs AIL generation count */ > + xfs_ail_t m_ail; /* fs active log item list */ > xfs_sb_t m_sb; /* copy of fs superblock */ > spinlock_t m_sb_lock; /* sb counter lock */ > struct xfs_buf *m_sb_bp; /* buffer for superblock */ > Index: 2.6.x-xfs-new/fs/xfs/xfs_trans.h > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_trans.h 2007-11-02 13:44:46.000000000 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfs_trans.h 2007-11-05 14:01:13.205272667 +1100 > @@ -993,6 +993,7 @@ int _xfs_trans_commit(xfs_trans_t *, > #define xfs_trans_commit(tp, flags) _xfs_trans_commit(tp, flags, NULL) > void xfs_trans_cancel(xfs_trans_t *, int); > void xfs_trans_ail_init(struct xfs_mount *); > +void xfs_trans_ail_destroy(struct xfs_mount *); > xfs_lsn_t xfs_trans_push_ail(struct xfs_mount *, xfs_lsn_t); > xfs_lsn_t xfs_trans_tail_ail(struct xfs_mount *); > void xfs_trans_unlocked_item(struct xfs_mount *, > Index: 2.6.x-xfs-new/fs/xfs/xfs_trans_ail.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_trans_ail.c 2007-10-02 16:01:48.000000000 +1000 > +++ 2.6.x-xfs-new/fs/xfs/xfs_trans_ail.c 2007-11-05 14:46:44.206327966 +1100 > @@ -57,7 +57,7 @@ xfs_trans_tail_ail( > xfs_log_item_t *lip; > > spin_lock(&mp->m_ail_lock); > - lip = xfs_ail_min(&(mp->m_ail)); > + lip = xfs_ail_min(&(mp->m_ail.xa_ail)); > if (lip == NULL) { > lsn = (xfs_lsn_t)0; > } else { > @@ -71,25 +71,22 @@ xfs_trans_tail_ail( > /* > * xfs_trans_push_ail > * > - * This routine is called to move the tail of the AIL > - * forward. It does this by trying to flush items in the AIL > - * whose lsns are below the given threshold_lsn. > + * This routine is called to move the tail of the AIL forward. It does this by > + * trying to flush items in the AIL whose lsns are below the given > + * threshold_lsn. > * > - * The routine returns the lsn of the tail of the log. > + * the push is run asynchronously in a separate thread, so we return the tail > + * of the log right now instead of the tail after the push. This means we will > + * either continue right away, or we will sleep waiting on the async thread to > + * do it's work. > */ > xfs_lsn_t > xfs_trans_push_ail( > xfs_mount_t *mp, > xfs_lsn_t threshold_lsn) > { > - xfs_lsn_t lsn; > xfs_log_item_t *lip; > int gen; > - int restarts; > - int lock_result; > - int flush_log; > - > -#define XFS_TRANS_PUSH_AIL_RESTARTS 1000 > > spin_lock(&mp->m_ail_lock); > lip = xfs_trans_first_ail(mp, &gen); > @@ -100,57 +97,105 @@ xfs_trans_push_ail( > spin_unlock(&mp->m_ail_lock); > return (xfs_lsn_t)0; > } > + if (XFS_LSN_CMP(threshold_lsn, mp->m_ail.xa_target) > 0) > + xfsaild_wakeup(mp, threshold_lsn); > + spin_unlock(&mp->m_ail_lock); > + return (xfs_lsn_t)lip->li_lsn; > +} > + > +/* > + * Return the item in the AIL with the current lsn. > + * Return the current tree generation number for use > + * in calls to xfs_trans_next_ail(). > + */ > +STATIC xfs_log_item_t * > +xfs_trans_first_push_ail( > + xfs_mount_t *mp, > + int *gen, > + xfs_lsn_t lsn) > +{ > + xfs_log_item_t *lip; > + > + lip = xfs_ail_min(&(mp->m_ail.xa_ail)); > + *gen = (int)mp->m_ail.xa_gen; > + while (lip && (XFS_LSN_CMP(lip->li_lsn, lsn) < 0)) > + lip = lip->li_ail.ail_forw; > + > + return (lip); > +} > + > +/* > + * Function that does the work of pushing on the AIL > + */ > +long > +xfsaild_push( > + xfs_mount_t *mp, > + xfs_lsn_t *last_lsn) > +{ > + long tout = 100; /* milliseconds */ > + xfs_lsn_t last_pushed_lsn = *last_lsn; > + xfs_lsn_t target = mp->m_ail.xa_target; > + xfs_lsn_t lsn; > + xfs_log_item_t *lip; > + int lock_result; > + int gen; > + int restarts; > + int flush_log, count, stuck; > + > +#define XFS_TRANS_PUSH_AIL_RESTARTS 10 > + > + spin_lock(&mp->m_ail_lock); > + lip = xfs_trans_first_push_ail(mp, &gen, *last_lsn); > + if (lip == NULL || XFS_FORCED_SHUTDOWN(mp)) { > + /* > + * AIL is empty or our push has reached the end. > + */ > + spin_unlock(&mp->m_ail_lock); > + last_pushed_lsn = 0; > + goto out; > + } > > XFS_STATS_INC(xs_push_ail); > > /* > * While the item we are looking at is below the given threshold > - * try to flush it out. Make sure to limit the number of times > - * we allow xfs_trans_next_ail() to restart scanning from the > - * beginning of the list. We'd like not to stop until we've at least > + * try to flush it out. We'd like not to stop until we've at least > * tried to push on everything in the AIL with an LSN less than > - * the given threshold. However, we may give up before that if > - * we realize that we've been holding the AIL lock for 'too long', > - * blocking interrupts. Currently, too long is < 500us roughly. > + * the given threshold. > + * > + * However, we will stop after a certain number of pushes and wait > + * for a reduced timeout to fire before pushing further. This > + * prevents use from spinning when we can't do anything or there is > + * lots of contention on the AIL lists. > */ > - flush_log = 0; > - restarts = 0; > - while (((restarts < XFS_TRANS_PUSH_AIL_RESTARTS) && > - (XFS_LSN_CMP(lip->li_lsn, threshold_lsn) < 0))) { > + tout = 10; > + lsn = lip->li_lsn; > + flush_log = stuck = count = 0; > + while ((XFS_LSN_CMP(lip->li_lsn, target) < 0)) { > /* > - * If we can lock the item without sleeping, unlock > - * the AIL lock and flush the item. Then re-grab the > - * AIL lock so we can look for the next item on the > - * AIL. Since we unlock the AIL while we flush the > - * item, the next routine may start over again at the > - * the beginning of the list if anything has changed. > - * That is what the generation count is for. > + * If we can lock the item without sleeping, unlock the AIL > + * lock and flush the item. Then re-grab the AIL lock so we > + * can look for the next item on the AIL. List changes are > + * handled by the AIL lookup functions internally > * > - * If we can't lock the item, either its holder will flush > - * it or it is already being flushed or it is being relogged. > - * In any of these case it is being taken care of and we > - * can just skip to the next item in the list. > + * If we can't lock the item, either its holder will flush it > + * or it is already being flushed or it is being relogged. In > + * any of these case it is being taken care of and we can just > + * skip to the next item in the list. > */ > lock_result = IOP_TRYLOCK(lip); > + spin_unlock(&mp->m_ail_lock); > switch (lock_result) { > case XFS_ITEM_SUCCESS: > - spin_unlock(&mp->m_ail_lock); > XFS_STATS_INC(xs_push_ail_success); > IOP_PUSH(lip); > - spin_lock(&mp->m_ail_lock); > + last_pushed_lsn = lsn; > break; > > case XFS_ITEM_PUSHBUF: > - spin_unlock(&mp->m_ail_lock); > XFS_STATS_INC(xs_push_ail_pushbuf); > -#ifdef XFSRACEDEBUG > - delay_for_intr(); > - delay(300); > -#endif > - ASSERT(lip->li_ops->iop_pushbuf); > - ASSERT(lip); > IOP_PUSHBUF(lip); > - spin_lock(&mp->m_ail_lock); > + last_pushed_lsn = lsn; > break; > > case XFS_ITEM_PINNED: > @@ -160,10 +205,14 @@ xfs_trans_push_ail( > > case XFS_ITEM_LOCKED: > XFS_STATS_INC(xs_push_ail_locked); > + last_pushed_lsn = lsn; > + stuck++; > break; > > case XFS_ITEM_FLUSHING: > XFS_STATS_INC(xs_push_ail_flushing); > + last_pushed_lsn = lsn; > + stuck++; > break; > > default: > @@ -171,19 +220,26 @@ xfs_trans_push_ail( > break; > } > > + spin_lock(&mp->m_ail_lock); > + count++; > + /* Too many items we can't do anything with? */ > + if (stuck > 100) > + break; > + /* we're either starting or stopping if there is no log */ > + if (!mp->m_log) > + break; > + /* should we bother continuing? */ > + if (XFS_FORCED_SHUTDOWN(mp)) > + break; > + /* get the next item */ > lip = xfs_trans_next_ail(mp, lip, &gen, &restarts); > - if (lip == NULL) { > + if (lip == NULL) > break; > - } > - if (XFS_FORCED_SHUTDOWN(mp)) { > - /* > - * Just return if we shut down during the last try. > - */ > - spin_unlock(&mp->m_ail_lock); > - return (xfs_lsn_t)0; > - } > - > + if (restarts > XFS_TRANS_PUSH_AIL_RESTARTS) > + break; > + lsn = lip->li_lsn; > } > + spin_unlock(&mp->m_ail_lock); > > if (flush_log) { > /* > @@ -191,22 +247,33 @@ xfs_trans_push_ail( > * push out the log so it will become unpinned and > * move forward in the AIL. > */ > - spin_unlock(&mp->m_ail_lock); > XFS_STATS_INC(xs_push_ail_flush); > xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE); > - spin_lock(&mp->m_ail_lock); > } > > - lip = xfs_ail_min(&(mp->m_ail)); > - if (lip == NULL) { > - lsn = (xfs_lsn_t)0; > - } else { > - lsn = lip->li_lsn; > + /* > + * We reached the target so wait a bit longer for I/O to complete and > + * remove pushed items from the AIL before we start the next scan from > + * the start of the AIL. > + */ > + if ((XFS_LSN_CMP(lsn, target) >= 0)) { > + tout += 20; > + last_pushed_lsn = 0; > + } else if ((restarts > XFS_TRANS_PUSH_AIL_RESTARTS) || > + (count && (count < (stuck + 10)))) { > + /* > + * Either there is a lot of contention on the AIL or we > + * found a lot of items we couldn't do anything with. > + * Backoff a bit more to allow some I/O to complete before > + * continuing from where we were. > + */ > + tout += 10; > } > > - spin_unlock(&mp->m_ail_lock); > - return lsn; > -} /* xfs_trans_push_ail */ > +out: > + *last_lsn = last_pushed_lsn; > + return tout; > +} /* xfsaild_push */ > > > /* > @@ -247,7 +314,7 @@ xfs_trans_unlocked_item( > * the call to xfs_log_move_tail() doesn't do anything if there's > * not enough free space to wake people up so we're safe calling it. > */ > - min_lip = xfs_ail_min(&mp->m_ail); > + min_lip = xfs_ail_min(&mp->m_ail.xa_ail); > > if (min_lip == lip) > xfs_log_move_tail(mp, 1); > @@ -279,7 +346,7 @@ xfs_trans_update_ail( > xfs_log_item_t *dlip=NULL; > xfs_log_item_t *mlip; /* ptr to minimum lip */ > > - ailp = &(mp->m_ail); > + ailp = &(mp->m_ail.xa_ail); > mlip = xfs_ail_min(ailp); > > if (lip->li_flags & XFS_LI_IN_AIL) { > @@ -292,10 +359,10 @@ xfs_trans_update_ail( > lip->li_lsn = lsn; > > xfs_ail_insert(ailp, lip); > - mp->m_ail_gen++; > + mp->m_ail.xa_gen++; > > if (mlip == dlip) { > - mlip = xfs_ail_min(&(mp->m_ail)); > + mlip = xfs_ail_min(&(mp->m_ail.xa_ail)); > spin_unlock(&mp->m_ail_lock); > xfs_log_move_tail(mp, mlip->li_lsn); > } else { > @@ -330,7 +397,7 @@ xfs_trans_delete_ail( > xfs_log_item_t *mlip; > > if (lip->li_flags & XFS_LI_IN_AIL) { > - ailp = &(mp->m_ail); > + ailp = &(mp->m_ail.xa_ail); > mlip = xfs_ail_min(ailp); > dlip = xfs_ail_delete(ailp, lip); > ASSERT(dlip == lip); > @@ -338,10 +405,10 @@ xfs_trans_delete_ail( > > lip->li_flags &= ~XFS_LI_IN_AIL; > lip->li_lsn = 0; > - mp->m_ail_gen++; > + mp->m_ail.xa_gen++; > > if (mlip == dlip) { > - mlip = xfs_ail_min(&(mp->m_ail)); > + mlip = xfs_ail_min(&(mp->m_ail.xa_ail)); > spin_unlock(&mp->m_ail_lock); > xfs_log_move_tail(mp, (mlip ? mlip->li_lsn : 0)); > } else { > @@ -379,10 +446,10 @@ xfs_trans_first_ail( > { > xfs_log_item_t *lip; > > - lip = xfs_ail_min(&(mp->m_ail)); > - *gen = (int)mp->m_ail_gen; > + lip = xfs_ail_min(&(mp->m_ail.xa_ail)); > + *gen = (int)mp->m_ail.xa_gen; > > - return (lip); > + return lip; > } > > /* > @@ -402,11 +469,11 @@ xfs_trans_next_ail( > xfs_log_item_t *nlip; > > ASSERT(mp && lip && gen); > - if (mp->m_ail_gen == *gen) { > - nlip = xfs_ail_next(&(mp->m_ail), lip); > + if (mp->m_ail.xa_gen == *gen) { > + nlip = xfs_ail_next(&(mp->m_ail.xa_ail), lip); > } else { > - nlip = xfs_ail_min(&(mp->m_ail)); > - *gen = (int)mp->m_ail_gen; > + nlip = xfs_ail_min(&(mp->m_ail).xa_ail); > + *gen = (int)mp->m_ail.xa_gen; > if (restarts != NULL) { > XFS_STATS_INC(xs_push_ail_restarts); > (*restarts)++; > @@ -435,8 +502,16 @@ void > xfs_trans_ail_init( > xfs_mount_t *mp) > { > - mp->m_ail.ail_forw = (xfs_log_item_t*)&(mp->m_ail); > - mp->m_ail.ail_back = (xfs_log_item_t*)&(mp->m_ail); > + mp->m_ail.xa_ail.ail_forw = (xfs_log_item_t*)&mp->m_ail.xa_ail; > + mp->m_ail.xa_ail.ail_back = (xfs_log_item_t*)&mp->m_ail.xa_ail; > + xfsaild_start(mp); > +} > + > +void > +xfs_trans_ail_destroy( > + xfs_mount_t *mp) > +{ > + xfsaild_stop(mp); > } > > /* > Index: 2.6.x-xfs-new/fs/xfs/xfs_trans_priv.h > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_trans_priv.h 2007-10-02 16:01:48.000000000 +1000 > +++ 2.6.x-xfs-new/fs/xfs/xfs_trans_priv.h 2007-11-05 14:02:18.784782356 +1100 > @@ -57,4 +57,12 @@ struct xfs_log_item *xfs_trans_next_ail( > struct xfs_log_item *, int *, int *); > > > +/* > + * AIL push thread support > + */ > +long xfsaild_push(struct xfs_mount *, xfs_lsn_t *); > +void xfsaild_wakeup(struct xfs_mount *, xfs_lsn_t); > +void xfsaild_start(struct xfs_mount *); > +void xfsaild_stop(struct xfs_mount *); > + > #endif /* __XFS_TRANS_PRIV_H__ */ > Index: 2.6.x-xfs-new/fs/xfs/xfsidbg.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfsidbg.c 2007-11-02 13:44:50.000000000 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfsidbg.c 2007-11-05 14:50:43.099049624 +1100 > @@ -6220,13 +6220,13 @@ xfsidbg_xaildump(xfs_mount_t *mp) > }; > int count; > > - if ((mp->m_ail.ail_forw == NULL) || > - (mp->m_ail.ail_forw == (xfs_log_item_t *)&mp->m_ail)) { > + if ((mp->m_ail.xa_ail.ail_forw == NULL) || > + (mp->m_ail.xa_ail.ail_forw == (xfs_log_item_t *)&mp->m_ail.xa_ail)) { > kdb_printf("AIL is empty\n"); > return; > } > kdb_printf("AIL for mp 0x%p, oldest first\n", mp); > - lip = (xfs_log_item_t*)mp->m_ail.ail_forw; > + lip = (xfs_log_item_t*)mp->m_ail.xa_ail.ail_forw; > for (count = 0; lip; count++) { > kdb_printf("[%d] type %s ", count, xfsidbg_item_type_str(lip)); > printflags((uint)(lip->li_flags), li_flags, "flags:"); > @@ -6255,7 +6255,7 @@ xfsidbg_xaildump(xfs_mount_t *mp) > break; > } > > - if (lip->li_ail.ail_forw == (xfs_log_item_t*)&mp->m_ail) { > + if (lip->li_ail.ail_forw == (xfs_log_item_t*)&mp->m_ail.xa_ail) { > lip = NULL; > } else { > lip = lip->li_ail.ail_forw; > @@ -6312,9 +6312,9 @@ xfsidbg_xmount(xfs_mount_t *mp) > > kdb_printf("xfs_mount at 0x%p\n", mp); > kdb_printf("tid 0x%x ail_lock 0x%p &ail 0x%p\n", > - mp->m_tid, &mp->m_ail_lock, &mp->m_ail); > + mp->m_tid, &mp->m_ail_lock, &mp->m_ail.xa_ail); > kdb_printf("ail_gen 0x%x &sb 0x%p\n", > - mp->m_ail_gen, &mp->m_sb); > + mp->m_ail.xa_gen, &mp->m_sb); > kdb_printf("sb_lock 0x%p sb_bp 0x%p dev 0x%x logdev 0x%x rtdev 0x%x\n", > &mp->m_sb_lock, mp->m_sb_bp, > mp->m_ddev_targp ? mp->m_ddev_targp->bt_dev : 0, > From owner-xfs@oss.sgi.com Thu Nov 8 17:02:30 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 08 Nov 2007 17:02:34 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_66, T_STOX_BOUND_090909_B autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA912QOj021135 for ; Thu, 8 Nov 2007 17:02:29 -0800 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA18984; Fri, 9 Nov 2007 12:02:18 +1100 Message-ID: <4733B1CA.9030109@sgi.com> Date: Fri, 09 Nov 2007 12:03:06 +1100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.6 (X11/20070728) MIME-Version: 1.0 To: Christoph Hellwig CC: xfs-dev , xfs-oss Subject: Re: [PATCH] Turn off XBF_READ_AHEAD in io completion References: <47296FF7.8080607@sgi.com> <20071101100012.GA20065@infradead.org> In-Reply-To: <20071101100012.GA20065@infradead.org> Content-Type: multipart/mixed; boundary="------------020301010005050909020108" X-Virus-Scanned: ClamAV 0.91.2/4717/Thu Nov 8 16:05:49 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13597 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs This is a multi-part message in MIME format. --------------020301010005050909020108 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Christoph Hellwig wrote: > On Thu, Nov 01, 2007 at 05:19:35PM +1100, Lachlan McIlroy wrote: >> Read-ahead of an inode cluster will set XBF_READ_AHEAD in the buffer. >> If we don't remove the flag it will still be set when we flush the >> buffer back to disk. Not sure if leaving this flag set causes any >> serious problems but it does trigger an assert. > > It might be better if such temporary flags never actually make it to > bp->b_flags. Just pass down a flags variable all the way to > _xfs_buf_ioapply and keep the flags just for this I/O separate from > those that are permanent and in bp->b_flags. > Okay, I've done that (new patch attached). It's certainly not as clean as the last patch. --------------020301010005050909020108 Content-Type: text/x-patch; name="readahead.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="readahead.diff" --- fs/xfs/linux-2.6/xfs_buf.c_1.247 2007-10-29 16:01:29.000000000 +1100 +++ fs/xfs/linux-2.6/xfs_buf.c 2007-11-02 13:39:48.000000000 +1100 @@ -1020,7 +1020,7 @@ xfs_buf_iodone_work( (bp->b_flags & (XBF_ORDERED|XBF_ASYNC)) == (XBF_ORDERED|XBF_ASYNC)) { XB_TRACE(bp, "ordered_retry", bp->b_iodone); bp->b_flags &= ~XBF_ORDERED; - xfs_buf_iorequest(bp); + xfs_buf_iorequest(bp, bp->b_flags); } else if (bp->b_iodone) (*(bp->b_iodone))(bp); else if (bp->b_flags & XBF_ASYNC) @@ -1082,9 +1082,9 @@ xfs_buf_iostart( } bp->b_flags &= ~(XBF_READ | XBF_WRITE | XBF_ASYNC | XBF_DELWRI | \ - XBF_READ_AHEAD | _XBF_RUN_QUEUES); + _XBF_RUN_QUEUES); bp->b_flags |= flags & (XBF_READ | XBF_WRITE | XBF_ASYNC | \ - XBF_READ_AHEAD | _XBF_RUN_QUEUES); + _XBF_RUN_QUEUES); BUG_ON(bp->b_bn == XFS_BUF_DADDR_NULL); @@ -1093,7 +1093,7 @@ xfs_buf_iostart( * a shutdown situation, for example). */ status = (flags & XBF_WRITE) ? - xfs_buf_iostrategy(bp) : xfs_buf_iorequest(bp); + xfs_buf_iostrategy(bp, flags) : xfs_buf_iorequest(bp, flags); /* Wait for I/O if we are not an async request. * Note: async I/O request completion will release the buffer, @@ -1172,7 +1172,8 @@ xfs_buf_bio_end_io( STATIC void _xfs_buf_ioapply( - xfs_buf_t *bp) + xfs_buf_t *bp, + xfs_buf_flags_t flags) { int i, rw, map_i, total_nr_pages, nr_pages; struct bio *bio; @@ -1194,7 +1195,7 @@ _xfs_buf_ioapply( rw = (bp->b_flags & XBF_WRITE) ? WRITE_SYNC : READ_SYNC; } else { rw = (bp->b_flags & XBF_WRITE) ? WRITE : - (bp->b_flags & XBF_READ_AHEAD) ? READA : READ; + (flags & XBF_READ_AHEAD) ? READA : READ; } /* Special code path for reading a sub page size buffer in -- @@ -1279,7 +1280,8 @@ submit_io: int xfs_buf_iorequest( - xfs_buf_t *bp) + xfs_buf_t *bp, + xfs_buf_flags_t flags) { XB_TRACE(bp, "iorequest", 0); @@ -1299,7 +1301,7 @@ xfs_buf_iorequest( * all the I/O from calling xfs_buf_ioend too early. */ atomic_set(&bp->b_io_remaining, 1); - _xfs_buf_ioapply(bp); + _xfs_buf_ioapply(bp, flags); _xfs_buf_ioend(bp, 0); xfs_buf_rele(bp); @@ -1775,7 +1777,7 @@ xfsbufd( ASSERT(target == bp->b_target); list_del_init(&bp->b_list); - xfs_buf_iostrategy(bp); + xfs_buf_iostrategy(bp, bp->b_flags); count++; } @@ -1819,7 +1821,7 @@ xfs_flush_buftarg( else list_del_init(&bp->b_list); - xfs_buf_iostrategy(bp); + xfs_buf_iostrategy(bp, bp->b_flags); } if (wait) --- fs/xfs/linux-2.6/xfs_buf.h_1.122 2007-11-02 13:34:39.000000000 +1100 +++ fs/xfs/linux-2.6/xfs_buf.h 2007-11-02 13:38:35.000000000 +1100 @@ -191,14 +191,14 @@ extern void xfs_buf_unlock(xfs_buf_t *); extern void xfs_buf_ioend(xfs_buf_t *, int); extern void xfs_buf_ioerror(xfs_buf_t *, int); extern int xfs_buf_iostart(xfs_buf_t *, xfs_buf_flags_t); -extern int xfs_buf_iorequest(xfs_buf_t *); +extern int xfs_buf_iorequest(xfs_buf_t *, xfs_buf_flags_t); extern int xfs_buf_iowait(xfs_buf_t *); extern void xfs_buf_iomove(xfs_buf_t *, size_t, size_t, xfs_caddr_t, xfs_buf_rw_t); -static inline int xfs_buf_iostrategy(xfs_buf_t *bp) +static inline int xfs_buf_iostrategy(xfs_buf_t *bp, xfs_buf_flags_t flags) { - return bp->b_strat ? bp->b_strat(bp) : xfs_buf_iorequest(bp); + return bp->b_strat ? bp->b_strat(bp) : xfs_buf_iorequest(bp, flags); } static inline int xfs_buf_geterror(xfs_buf_t *bp) @@ -380,7 +380,7 @@ static inline int XFS_bwrite(xfs_buf_t * bp->b_flags |= _XBF_RUN_QUEUES; xfs_buf_delwri_dequeue(bp); - xfs_buf_iostrategy(bp); + xfs_buf_iostrategy(bp, bp->b_flags); if (iowait) { error = xfs_buf_iowait(bp); xfs_buf_relse(bp); @@ -395,7 +395,7 @@ static inline int xfs_bdwrite(void *mp, return xfs_buf_iostart(bp, XBF_DELWRI | XBF_ASYNC); } -#define XFS_bdstrat(bp) xfs_buf_iorequest(bp) +#define XFS_bdstrat(bp) xfs_buf_iorequest(bp, (bp)->b_flags) #define xfs_iowait(bp) xfs_buf_iowait(bp) --- fs/xfs/linux-2.6/xfs_lrw.c_1.271 2007-11-09 11:58:53.000000000 +1100 +++ fs/xfs/linux-2.6/xfs_lrw.c 2007-11-09 11:57:50.000000000 +1100 @@ -878,7 +878,7 @@ xfs_bdstrat_cb(struct xfs_buf *bp) mp = XFS_BUF_FSPRIVATE3(bp, xfs_mount_t *); if (!XFS_FORCED_SHUTDOWN(mp)) { - xfs_buf_iorequest(bp); + xfs_buf_iorequest(bp, bp->b_flags); return 0; } else { xfs_buftrace("XFS__BDSTRAT IOERROR", bp); @@ -912,7 +912,7 @@ xfsbdstrat( * if (XFS_BUF_IS_GRIO(bp)) { */ - xfs_buf_iorequest(bp); + xfs_buf_iorequest(bp, bp->b_flags); return 0; } --------------020301010005050909020108-- From owner-xfs@oss.sgi.com Thu Nov 8 17:08:53 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 08 Nov 2007 17:08:59 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA918kYk022102 for ; Thu, 8 Nov 2007 17:08:52 -0800 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA19109; Fri, 9 Nov 2007 12:08:46 +1100 Message-ID: <4733B34F.70407@sgi.com> Date: Fri, 09 Nov 2007 12:09:35 +1100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.6 (X11/20070728) MIME-Version: 1.0 To: David Chinner CC: xfs-oss , xfs-dev Subject: Re: [patch] Fix broken inode clustering References: <20071108003848.GA66820511@sgi.com> In-Reply-To: <20071108003848.GA66820511@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4717/Thu Nov 8 16:05:49 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13598 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs Looks good Dave. David Chinner wrote: > The radix tree based inode caches did away with the inode cluster hashes, > replacing them with a bunch of masking and gang lookups on the radix tree. > > This masking got broken when moving the code to per-ag radix trees and > indexing by agino # rather than straight inode number. The result is > clustered inode writeback does not cluster and things can go extremely > slowly when there are lots of inodes to write. > > The following patch fixes this up by comparing agino # of the inode > found to the index of the cluster we are looking for. > > Signed-off-by: Dave Chinner > Tested-by: Torsten Kaiser > > --- > fs/xfs/xfs_iget.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Index: 2.6.x-xfs-new/fs/xfs/xfs_iget.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_iget.c 2007-11-02 13:44:46.000000000 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfs_iget.c 2007-11-07 13:08:42.534440675 +1100 > @@ -248,7 +248,7 @@ finish_inode: > icl = NULL; > if (radix_tree_gang_lookup(&pag->pag_ici_root, (void**)&iq, > first_index, 1)) { > - if ((iq->i_ino & mask) == first_index) > + if ((XFS_INO_TO_AGINO(mp, iq->i_ino) & mask) == first_index) > icl = iq->i_cluster; > } > > > > From owner-xfs@oss.sgi.com Thu Nov 8 19:17:04 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 08 Nov 2007 19:17:07 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA93GxfV001867 for ; Thu, 8 Nov 2007 19:17:02 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA21450; Fri, 9 Nov 2007 14:16:58 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lA93GvdD98944657; Fri, 9 Nov 2007 14:16:58 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lA93Gtnx99032984; Fri, 9 Nov 2007 14:16:55 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Fri, 9 Nov 2007 14:16:55 +1100 From: David Chinner To: Lachlan McIlroy Cc: David Chinner , xfs-oss , xfs-dev Subject: Re: [PATCH, RFC] Move AIL pushing into a separate thread Message-ID: <20071109031655.GQ66820511@sgi.com> References: <20071105050706.GW66820511@sgi.com> <4733AB27.70208@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4733AB27.70208@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4719/Thu Nov 8 17:49:00 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13599 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Fri, Nov 09, 2007 at 11:34:47AM +1100, Lachlan McIlroy wrote: > I like the sound of this Dave. I'm still going through the code in > detail. > > Could we convert the ail lock into a mutex to ease the load? I know > it may not improve throughput but it would at least relieve the CPUs > to do other stuff. Most of the time the ail lock is used for very short periods of time, (e.g. less than ten lines of code) so a spin lock is appropriate. What we are seeing here is too many CPUs holding it for to long trying to do the work one CPU could easily do. i.e. the bug we are seeing here is the contention on the lock, not the type of lock. If we change to a sleeping lock, all ppl will see is a slowdown and that is much, much harder to diagnose on a production system than spin lock contention.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Thu Nov 8 21:02:40 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 08 Nov 2007 21:02:43 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA952YG8015282 for ; Thu, 8 Nov 2007 21:02:38 -0800 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA23484; Fri, 9 Nov 2007 16:02:29 +1100 Received: by chook.melbourne.sgi.com (Postfix, from userid 16346) id 2C8B058C38F7; Fri, 9 Nov 2007 16:02:29 +1100 (EST) To: sgi.bugs.xfs@engr.sgi.com Cc: xfs@oss.sgi.com Subject: TAKE 972915 - Fix broken inode cluster setup. Message-Id: <20071109050229.2C8B058C38F7@chook.melbourne.sgi.com> Date: Fri, 9 Nov 2007 16:02:29 +1100 (EST) From: dgc@sgi.com (David Chinner) X-Virus-Scanned: ClamAV 0.91.2/4721/Thu Nov 8 19:47:43 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13600 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Fix broken inode cluster setup. The radix tree based inode caches did away with the inode cluster hashes, replacing them with a bunch of masking and gang lookups on the radix tree. This masking got broken when moving the code to per-ag radix trees and indexing by agino # rather than straight inode number. The result is clustered inode writeback does not cluster and things can go extremely slowly when there are lots of inodes to write. Fix it up by comparing the agino # of the inode we just looked up to the index of the cluster we are looking for. Signed-off-by: Dave Chinner Tested-by: Torsten Kaiser Date: Fri Nov 9 16:02:06 AEDT 2007 Workarea: chook.melbourne.sgi.com:/build/dgc/isms/2.6.x-xfs Inspected by: lachlan@sgi.com The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb Modid: xfs-linux-melb:xfs-kern:30033a fs/xfs/xfs_iget.c - 1.238 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_iget.c.diff?r1=text&tr1=1.238&r2=text&tr2=1.237&f=h - Make the cluster lookup use the agino rather than full inode number so that clusters are correctly built. From owner-xfs@oss.sgi.com Thu Nov 8 21:23:19 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 08 Nov 2007 21:23:38 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00, T_STOX_BOUND_090909_B autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA95NDcJ017668 for ; Thu, 8 Nov 2007 21:23:17 -0800 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA23821; Fri, 9 Nov 2007 16:23:13 +1100 Message-ID: <4733EEF2.9010504@sgi.com> Date: Fri, 09 Nov 2007 16:24:02 +1100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.6 (X11/20070728) MIME-Version: 1.0 To: xfs-dev , xfs-oss Subject: [PATCH] bulkstat fixups Content-Type: multipart/mixed; boundary="------------080005070805090205050607" X-Virus-Scanned: ClamAV 0.91.2/4721/Thu Nov 8 19:47:43 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13601 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs This is a multi-part message in MIME format. --------------080005070805090205050607 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Here's a collection of fixups for bulkstat for all the remaining issues. - sanity check for NULL user buffer in xfs_ioc_bulkstat[_compat]() - remove the special case for XFS_IOC_FSBULKSTAT with count == 1. This special case causes bulkstat to fail because the special case uses xfs_bulkstat_single() instead of xfs_bulkstat() and the two functions have different semantics. xfs_bulkstat() will return the next inode after the one supplied while skipping internal inodes (ie quota inodes). xfs_bulkstate_single() will only lookup the inode supplied and return an error if it is an internal inode. - in xfs_bulkstat(), need to initialise 'lastino' to the inode supplied so in cases were we return without examining any inodes the scan wont restart back at zero. - sanity check for valid *ubcountp values. Cannot sanity check for valid ubuffer here because some users of xfs_bulkstat() don't supply a buffer. - checks against 'ubleft' (the space left in the user's buffer) should be against 'statstruct_size' which is the supplied minimum object size. The mixture of checks against statstruct_size and 0 was one of the reasons we were skipping inodes. - if the formatter function returns BULKSTAT_RV_NOTHING and an error and the error is not ENOENT or EINVAL then we need to abort the scan. ENOENT is for inodes that are no longer valid and we just skip them. EINVAL is returned if we try to lookup an internal inode so we skip them too. For a DMF scan if the inode and DMF attribute cannot fit into the space left in the user's buffer it would return ERANGE. We didn't handle this error and skipped the inode. We would continue to skip inodes until one fitted into the user's buffer or we completed the scan. - put back the recalculation of agino (that got removed with the last fix) at the end of the while loop. This is because the code at the start of the loop expects agino to be the last inode examined if it is non-zero. - if we found some inodes but then encountered an error, return success this time and the error next time. If the formatter aborted with ENOMEM we will now return this error but only if we couldn't read any inodes. Previously if we encountered ENOMEM without reading any inodes we returned a zero count and no error which falsely indicated the scan was complete. --------------080005070805090205050607 Content-Type: text/x-patch; name="bulkstat.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="bulkstat.diff" --- fs/xfs/linux-2.6/xfs_ioctl.c_1.158 2007-11-09 15:51:03.000000000 +1100 +++ fs/xfs/linux-2.6/xfs_ioctl.c 2007-11-09 11:57:50.000000000 +1100 @@ -1024,24 +1024,20 @@ xfs_ioc_bulkstat( if ((count = bulkreq.icount) <= 0) return -XFS_ERROR(EINVAL); + if (bulkreq.ubuffer == NULL) + return -XFS_ERROR(EINVAL); + if (cmd == XFS_IOC_FSINUMBERS) error = xfs_inumbers(mp, &inlast, &count, bulkreq.ubuffer, xfs_inumbers_fmt); else if (cmd == XFS_IOC_FSBULKSTAT_SINGLE) error = xfs_bulkstat_single(mp, &inlast, bulkreq.ubuffer, &done); - else { /* XFS_IOC_FSBULKSTAT */ - if (count == 1 && inlast != 0) { - inlast++; - error = xfs_bulkstat_single(mp, &inlast, - bulkreq.ubuffer, &done); - } else { - error = xfs_bulkstat(mp, &inlast, &count, - (bulkstat_one_pf)xfs_bulkstat_one, NULL, - sizeof(xfs_bstat_t), bulkreq.ubuffer, - BULKSTAT_FG_QUICK, &done); - } - } + else /* XFS_IOC_FSBULKSTAT */ + error = xfs_bulkstat(mp, &inlast, &count, + (bulkstat_one_pf)xfs_bulkstat_one, NULL, + sizeof(xfs_bstat_t), bulkreq.ubuffer, + BULKSTAT_FG_QUICK, &done); if (error) return -error; --- fs/xfs/linux-2.6/xfs_ioctl32.c_1.23 2007-11-02 14:27:11.000000000 +1100 +++ fs/xfs/linux-2.6/xfs_ioctl32.c 2007-11-02 14:13:27.000000000 +1100 @@ -292,6 +292,9 @@ xfs_ioc_bulkstat_compat( if ((count = bulkreq.icount) <= 0) return -XFS_ERROR(EINVAL); + if (bulkreq.ubuffer == NULL) + return -XFS_ERROR(EINVAL); + if (cmd == XFS_IOC_FSINUMBERS) error = xfs_inumbers(mp, &inlast, &count, bulkreq.ubuffer, xfs_inumbers_fmt_compat); --- fs/xfs/xfs_itable.c_1.157 2007-10-25 17:22:09.000000000 +1000 +++ fs/xfs/xfs_itable.c 2007-11-01 17:22:28.000000000 +1100 @@ -353,7 +353,7 @@ xfs_bulkstat( xfs_inobt_rec_incore_t *irbp; /* current irec buffer pointer */ xfs_inobt_rec_incore_t *irbuf; /* start of irec buffer */ xfs_inobt_rec_incore_t *irbufend; /* end of good irec buffer entries */ - xfs_ino_t lastino=0; /* last inode number returned */ + xfs_ino_t lastino; /* last inode number returned */ int nbcluster; /* # of blocks in a cluster */ int nicluster; /* # of inodes in a cluster */ int nimask; /* mask for inode clusters */ @@ -373,6 +373,7 @@ xfs_bulkstat( * Get the last inode value, see if there's nothing to do. */ ino = (xfs_ino_t)*lastinop; + lastino = ino; dip = NULL; agno = XFS_INO_TO_AGNO(mp, ino); agino = XFS_INO_TO_AGINO(mp, ino); @@ -382,6 +383,9 @@ xfs_bulkstat( *ubcountp = 0; return 0; } + if (!ubcountp || *ubcountp <= 0) { + return EINVAL; + } ubcount = *ubcountp; /* statstruct's */ ubleft = ubcount * statstruct_size; /* bytes */ *ubcountp = ubelem = 0; @@ -560,7 +564,7 @@ xfs_bulkstat( * Now process this chunk of inodes. */ for (agino = irbp->ir_startino, chunkidx = clustidx = 0; - ubleft > 0 && + ubleft >= statstruct_size && irbp->ir_freecount < XFS_INODES_PER_CHUNK; chunkidx++, clustidx++, agino++) { ASSERT(chunkidx < XFS_INODES_PER_CHUNK); @@ -663,15 +667,13 @@ xfs_bulkstat( ubleft, private_data, bno, &ubused, dip, &fmterror); if (fmterror == BULKSTAT_RV_NOTHING) { - if (error == EFAULT) { - ubleft = 0; - rval = error; - break; - } - else if (error == ENOMEM) + if (error && error != ENOENT && + error != EINVAL) { ubleft = 0; - else - lastino = ino; + rval = error; + break; + } + lastino = ino; continue; } if (fmterror == BULKSTAT_RV_GIVEUP) { @@ -694,11 +696,12 @@ xfs_bulkstat( /* * Set up for the next loop iteration. */ - if (ubleft > 0) { + if (ubleft >= statstruct_size) { if (end_of_ag) { agno++; agino = 0; - } + } else + agino = XFS_INO_TO_AGINO(mp, lastino); } else break; } @@ -707,6 +710,11 @@ xfs_bulkstat( */ kmem_free(irbuf, irbsize); *ubcountp = ubelem; + /* + * Found some inodes, return them now and return the error next time. + */ + if (ubelem) + rval = 0; if (agno >= mp->m_sb.sb_agcount) { /* * If we ran out of filesystem, mark lastino as off --------------080005070805090205050607-- From owner-xfs@oss.sgi.com Thu Nov 8 21:33:57 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 08 Nov 2007 21:34:05 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA95XoLt018838 for ; Thu, 8 Nov 2007 21:33:56 -0800 Received: from [134.14.55.89] (soarer.melbourne.sgi.com [134.14.55.89]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA24049; Fri, 9 Nov 2007 16:33:52 +1100 Message-ID: <4733F198.1090107@sgi.com> Date: Fri, 09 Nov 2007 16:35:20 +1100 From: Vlad Apostolov User-Agent: Thunderbird 2.0.0.6 (X11/20070728) MIME-Version: 1.0 To: lachlan@sgi.com CC: xfs-dev , xfs-oss Subject: Re: [PATCH] bulkstat fixups References: <4733EEF2.9010504@sgi.com> In-Reply-To: <4733EEF2.9010504@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4721/Thu Nov 8 19:47:43 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13602 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: vapo@sgi.com Precedence: bulk X-list: xfs It is looking good Lachlan. I also verified the patch with XFS QA and Judith reported that it fixed the xfs_bulkstat() problem - skipping inodes in the last AG. Regards, Vlad Lachlan McIlroy wrote: > Here's a collection of fixups for bulkstat for all the remaining issues. > > - sanity check for NULL user buffer in xfs_ioc_bulkstat[_compat]() > > - remove the special case for XFS_IOC_FSBULKSTAT with count == 1. > This special > case causes bulkstat to fail because the special case uses > xfs_bulkstat_single() > instead of xfs_bulkstat() and the two functions have different > semantics. > xfs_bulkstat() will return the next inode after the one supplied > while skipping > internal inodes (ie quota inodes). xfs_bulkstate_single() will only > lookup the > inode supplied and return an error if it is an internal inode. > > - in xfs_bulkstat(), need to initialise 'lastino' to the inode > supplied so in cases > were we return without examining any inodes the scan wont restart > back at zero. > > - sanity check for valid *ubcountp values. Cannot sanity check for > valid ubuffer > here because some users of xfs_bulkstat() don't supply a buffer. > > - checks against 'ubleft' (the space left in the user's buffer) should > be against > 'statstruct_size' which is the supplied minimum object size. The > mixture of > checks against statstruct_size and 0 was one of the reasons we were > skipping > inodes. > > - if the formatter function returns BULKSTAT_RV_NOTHING and an error > and the error > is not ENOENT or EINVAL then we need to abort the scan. ENOENT is > for inodes that > are no longer valid and we just skip them. EINVAL is returned if we > try to lookup > an internal inode so we skip them too. For a DMF scan if the inode > and DMF > attribute cannot fit into the space left in the user's buffer it > would return > ERANGE. We didn't handle this error and skipped the inode. We > would continue to > skip inodes until one fitted into the user's buffer or we completed > the scan. > > - put back the recalculation of agino (that got removed with the last > fix) at the > end of the while loop. This is because the code at the start of the > loop expects > agino to be the last inode examined if it is non-zero. > > - if we found some inodes but then encountered an error, return > success this time > and the error next time. If the formatter aborted with ENOMEM we > will now return > this error but only if we couldn't read any inodes. Previously if > we encountered > ENOMEM without reading any inodes we returned a zero count and no > error which > falsely indicated the scan was complete. From owner-xfs@oss.sgi.com Thu Nov 8 21:53:52 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 08 Nov 2007 21:53:56 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA95rnXx021081 for ; Thu, 8 Nov 2007 21:53:51 -0800 Received: from timothy-shimmins-power-mac-g5.local (boing.melbourne.sgi.com [134.14.55.141]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA24166; Fri, 9 Nov 2007 16:40:44 +1100 Message-ID: <4733F301.9020706@sgi.com> Date: Fri, 09 Nov 2007 16:41:21 +1100 From: Timothy Shimmin User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Andreas Gruenbacher CC: linux-xfs@oss.sgi.com, Gerald Bringhurst , Brandon Philips Subject: Re: acl and attr: Fix path walking code References: <200710281858.24428.agruen@suse.de> In-Reply-To: <200710281858.24428.agruen@suse.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4721/Thu Nov 8 19:47:43 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13603 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: tes@sgi.com Precedence: bulk X-list: xfs Hi Andreas, Andreas Gruenbacher wrote: > Hello, > > the tree walking code in acl and attr broke when resolve_symlinks() was > introduced (by me, unfortunately). Following symlinks passed in on the > command line is the intended behavior for the tools (unless in -P mode). The > first version was buggy, and so someone "fixed" it by replacing readlink() > with realpath() in resolve_symlinks(). > > The result is that the output of getfattr and getfacl will show pathnames that > may point anywhere. When processing a directory tree it sometimes is helpful > to treat symlinks as regular files, but resolving the pathnames is totally > wrong. > > After runnig into problem after problem with nftw and never ending up with > even half-way clean code, I think it's time to ditch it altogether and > replace it with sane code. So here are two patches, one for attr and one for > acl, that does that. > > Files include/walk_tree.h and libmisc/walk_tree.c are identical in both > patches; that code is shared between the two packages. > > Okay to apply? > > Thanks, > Andreas > I applied attr patch and tried it out on xfstests/062 (which I believe was based on one of your tests). ========================================================== --- 062.out 2006-03-28 12:52:32.000000000 +1000 +++ 062.out.bad 2007-11-09 15:38:09.000000000 +1100 @@ -526,6 +526,10 @@ user.name=0xbabe user.name3=0xdeface +# file: SCRATCH_MNT/lnk +trusted.name=0xbabe +trusted.name3=0xdeface + # file: SCRATCH_MNT/dev/b trusted.name=0xbabe trusted.name3=0xdeface @@ -562,6 +566,10 @@ user.1=0x3233 user.x=0x797a +# file: SCRATCH_MNT/descend/and/ascend +trusted.9=0x3837 +trusted.a=0x6263 + *** directory descent without following symlinks # file: SCRATCH_MNT/reg ========================================================== So for the following of symlinks with getfattr -L i.e. echo "*** directory descent with us following symlinks" getfattr -h -L -R -m '.' -e hex $SCRATCH_MNT Looking at the 2nd difference... It now picks up descend/and/ascend which contains the symlink of descend/and --> here/up. So that makes sense, it is following a symlink which it didn't before and finding a dir, "up" in the linked dir. Good. Looking at 1st difference... It is now showing up "lnk" which is a symlink: lnk --> dir So why is it showing this up and yet it is not showing descend/and (which is a link to here/up)? So yes we are following symlinks but are we supposed to just do the symlinks themselves as well? BTW, do we not allow user EAs on symlinks? (I've forgotten) --Tim From owner-xfs@oss.sgi.com Thu Nov 8 23:39:22 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 08 Nov 2007 23:39:27 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lA97dJSV032511 for ; Thu, 8 Nov 2007 23:39:20 -0800 Received: from timothy-shimmins-power-mac-g5.local (boing.melbourne.sgi.com [134.14.55.141]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id SAA26203; Fri, 9 Nov 2007 18:39:19 +1100 Message-ID: <47340ECC.4000205@sgi.com> Date: Fri, 09 Nov 2007 18:39:56 +1100 From: Timothy Shimmin User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Andreas Gruenbacher CC: linux-xfs@oss.sgi.com, Gerald Bringhurst , Brandon Philips Subject: Re: acl and attr: Fix path walking code References: <200710281858.24428.agruen@suse.de> In-Reply-To: <200710281858.24428.agruen@suse.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4723/Thu Nov 8 22:33:05 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13604 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: tes@sgi.com Precedence: bulk X-list: xfs Andreas Gruenbacher wrote: > Hello, > > the tree walking code in acl and attr broke when resolve_symlinks() was > introduced (by me, unfortunately). Following symlinks passed in on the > command line is the intended behavior for the tools (unless in -P mode). The > first version was buggy, and so someone "fixed" it by replacing readlink() > with realpath() in resolve_symlinks(). > > The result is that the output of getfattr and getfacl will show pathnames that > may point anywhere. When processing a directory tree it sometimes is helpful > to treat symlinks as regular files, but resolving the pathnames is totally > wrong. > > After runnig into problem after problem with nftw and never ending up with > even half-way clean code, I think it's time to ditch it altogether and > replace it with sane code. So here are two patches, one for attr and one for > acl, that does that. > > Files include/walk_tree.h and libmisc/walk_tree.c are identical in both > patches; that code is shared between the two packages. > > Okay to apply? > > Thanks, > Andreas > You mention -L/-P is like chown. However, -P for getattr isn't about not walking symlinks to directories, it's about skipping symlinks altogether, right? --Tim From owner-xfs@oss.sgi.com Fri Nov 9 01:14:24 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 09 Nov 2007 01:14:32 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=0.1 required=5.0 tests=AWL,BAYES_20,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from lucidpixels.com (lucidpixels.com [75.144.35.66]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA99EMsS020232 for ; Fri, 9 Nov 2007 01:14:24 -0800 Received: by lucidpixels.com (Postfix, from userid 1001) id D0FF71C000263; Fri, 9 Nov 2007 04:14:27 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by lucidpixels.com (Postfix) with ESMTP id AEF8D4019521; Fri, 9 Nov 2007 04:14:27 -0500 (EST) Date: Fri, 9 Nov 2007 04:14:27 -0500 (EST) From: Justin Piszcz X-X-Sender: jpiszcz@p34.internal.lan To: Carlos Carvalho cc: Jeff Lessem , root@c3sl.ufpr.br, Dan Williams , =?iso-8859-1?Q?BERTRAND_Jo=EBl?= , Neil Brown , linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, xfs@oss.sgi.com Subject: Re: 2.6.23.1: mdadm/raid5 hung/d-state In-Reply-To: <18227.33346.994456.270194@fisica.ufpr.br> Message-ID: References: <18222.16003.92062.970530@notabene.brown> <47303FB8.7000801@systella.fr> <1194398700.2970.18.camel@dwillia2-linux.ch.intel.com> <47314653.80905@Lessem.org> <18227.33346.994456.270194@fisica.ufpr.br> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: ClamAV 0.91.2/4724/Thu Nov 8 22:48:44 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13605 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jpiszcz@lucidpixels.com Precedence: bulk X-list: xfs On Thu, 8 Nov 2007, Carlos Carvalho wrote: > Jeff Lessem (Jeff@Lessem.org) wrote on 6 November 2007 22:00: > >Dan Williams wrote: > > > The following patch, also attached, cleans up cases where the code looks > > > at sh->ops.pending when it should be looking at the consistent > > > stack-based snapshot of the operations flags. > > > >I tried this patch (against a stock 2.6.23), and it did not work for > >me. Not only did I/O to the effected RAID5 & XFS partition stop, but > >also I/O to all other disks. I was not able to capture any debugging > >information, but I should be able to do that tomorrow when I can hook > >a serial console to the machine. > > > >I'm not sure if my problem is identical to these others, as mine only > >seems to manifest with RAID5+XFS. The RAID rebuilds with no problem, > >and I've not had any problems with RAID5+ext3. > > Us too! We're stuck trying to build a disk server with several disks > in a raid5 array, and the rsync from the old machine stops writing to > the new filesystem. It only happens under heavy IO. We can make it > lock without rsync, using 8 simultaneous dd's to the array. All IO > stops, including the resync after a newly created raid or after an > unclean reboot. > > We could not trigger the problem with ext3 or reiser3; it only happens > with xfs. > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Including XFS mailing list as well can you provide more information to them? From owner-xfs@oss.sgi.com Fri Nov 9 06:37:11 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 09 Nov 2007 06:37:23 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from py-out-1112.google.com (py-out-1112.google.com [64.233.166.182]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lA9Eb5R1010445 for ; Fri, 9 Nov 2007 06:37:08 -0800 Received: by py-out-1112.google.com with SMTP id u77so1092696pyb for ; Fri, 09 Nov 2007 06:37:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:reply-to:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; bh=0Y6W1bg1EB8oyZ+kPaaOFGaRejveHLlNPWtaLM3r6MU=; b=UGskOKJpj62NjBlAuTwMORWsOakzg5mJh0+6xusb9x4hzl9U4kHS2WyK9JQZgK1IjCWJweZ3jI7bTC274ylU430cybp/t4XNS77BiMJnEUyy1z4uE0SN//ai52Sax/rH0OXCmEmfxa/9Rc062JWJxMmFUCgocoi70jTvogSeEaM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:reply-to:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=DvvIO4nZO2Wz23PtQv1f0Guzhc7AELqJBbHArF+5LrL2fWTGBn/QC36RmeXux3pWHGay/IQRuP8cC/veXxPJe0Km+f4YgyRXFoBPeAE3V5fiK014V9turJWqQeLLN99W6V6rxsE65N29q7PQ6Q0dm44o1gr2ETKGnFZmV5rQHFM= Received: by 10.64.203.4 with SMTP id a4mr5752184qbg.1194617364250; Fri, 09 Nov 2007 06:09:24 -0800 (PST) Received: by 10.65.137.2 with HTTP; Fri, 9 Nov 2007 06:09:24 -0800 (PST) Message-ID: <3993afa00711090609j7ba8dee0t8c1772f8654eb2f0@mail.gmail.com> Date: Fri, 9 Nov 2007 12:09:24 -0200 From: "Fabiano Silva" Reply-To: fabiano@c3sl.ufpr.br To: "Justin Piszcz" Subject: Re: 2.6.23.1: mdadm/raid5 hung/d-state Cc: "Carlos Carvalho" , "Jeff Lessem" , root@c3sl.ufpr.br, "Dan Williams" , "=?ISO-8859-1?Q?BERTRAND_Jo=EBl?=" , "Neil Brown" , linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, xfs@oss.sgi.com In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_35744_18065873.1194617364213" References: <47303FB8.7000801@systella.fr> <1194398700.2970.18.camel@dwillia2-linux.ch.intel.com> <47314653.80905@Lessem.org> <18227.33346.994456.270194@fisica.ufpr.br> X-Google-Sender-Auth: 6d02a7b44afc9315 X-Virus-Scanned: ClamAV 0.91.2/4724/Thu Nov 8 22:48:44 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13606 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: fabiano@c3sl.ufpr.br Precedence: bulk X-list: xfs ------=_Part_35744_18065873.1194617364213 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline On Nov 9, 2007 7:14 AM, Justin Piszcz wrote: > > > > On Thu, 8 Nov 2007, Carlos Carvalho wrote: > > > Jeff Lessem (Jeff@Lessem.org) wrote on 6 November 2007 22:00: > > >Dan Williams wrote: > > > > The following patch, also attached, cleans up cases where the code looks > > > > at sh->ops.pending when it should be looking at the consistent > > > > stack-based snapshot of the operations flags. > > > > > >I tried this patch (against a stock 2.6.23), and it did not work for > > >me. Not only did I/O to the effected RAID5 & XFS partition stop, but > > >also I/O to all other disks. I was not able to capture any debugging > > >information, but I should be able to do that tomorrow when I can hook > > >a serial console to the machine. > > > > > >I'm not sure if my problem is identical to these others, as mine only > > >seems to manifest with RAID5+XFS. The RAID rebuilds with no problem, > > >and I've not had any problems with RAID5+ext3. > > > > Us too! We're stuck trying to build a disk server with several disks > > in a raid5 array, and the rsync from the old machine stops writing to > > the new filesystem. It only happens under heavy IO. We can make it > > lock without rsync, using 8 simultaneous dd's to the array. All IO > > stops, including the resync after a newly created raid or after an > > unclean reboot. > > > > We could not trigger the problem with ext3 or reiser3; it only happens > > with xfs. In our case all process using md4, including md4_resync, stay in D state. Call Trace: [] __generic_unplug_device+0x13/0x24 [] generic_unplug_device+0x18/0x28 [] get_active_stripe+0x22b/0x472 ... see dmesg (sysrq t) attached. We can reproduce this problem in two machines with the same configuration: - 2 x Dual-Core Opteron 2.8GHz - 8GB memory - 3ware 9000 with 10 x 750GB sata disks - Debian Etch x86_64 - raid5 + xfs (/dev/md4) in all these stock kernel's: - 2.6.22.11, 2.6.22.12, 2.6.23.1, 2.6.24-rc2 running: - for i in f{0..7}; do (dd bs=1M count=100000 if=/dev/zero of=$i &); done If we increase /sys/block/md4/md/stripe_cache_size the device and process back to work. ------=_Part_35744_18065873.1194617364213 Content-Type: application/x-gzip; name=dmesg_sysrq_t.txt.gz Content-Transfer-Encoding: base64 X-Attachment-Id: f_f8ss86jn0 Content-Disposition: attachment; filename=dmesg_sysrq_t.txt.gz H4sICApeNEcAA2RtZXNnX3N5c3JxX3QudHh0AOydXY+rOJrH7/dT1OWsWqv2O9Ba7cXO3ow06hlN z91qhfxGFV0E0kDOOTWffm0DCZDUISZFqqrbkbpbTSWPDfj5+W/78eOfqy8PD+wB4Z9g9BOmD1/z Uh0ennVd6uKnh19emn/89mD++1R9ffil5a3+t5+/+4OFPy9+slrr1/7W5KLIy8fFIlrePJ/9+O9/ PpppubR/3+fqIePtk64f5FNeqIeX6lA+mv+rCqXrhVLyMm9Pxn95AN0Hmn+wQBi4y92/4UP/P3/6 +W///Ot///tS/TPziSFAiDAecRgPtvtPjGYXABh+AlEMCGEELBXxmoGuzETGs0qQ7oK7CKhCVxQx /j6BETgzcPnCqRJZvFDEn3lRPPyz5lL/tFSZ//3PU2GERYDA//q/h0Y+aXUodNrmO10d2h/At5j/ CL5x5WMOoSziwpjb15XUTTOyBowx6mUrwhpJY0tVaaMLLa0VnDFjh1DsZ4ngCBhLabqviuIrz4cK aeh3dxhCbmukM34o2vQrf9Zpdihlm1flYNLLIsmYyIzFf1WlTo0vtnnT5rIxtgg0xhjys5YoHhtr j7pN9/zR1K2udqmlSGHsGqMwslaJzD6C2SjisXRvRaVFVT0f9saUkMYSBH5PkWkJQddSjoYYtoa4 n6GIaRpZQ3vXZmFXGeFnA0jV3Zbh87N5YO2TaSjFs725SBmDUnG/u8OxzJzBNJXc+GlqfL2SaVkp bd00tpXEfi05Egxbm7uyNbdqTKX62z6vrT2YGXsx8bznDNjndnbHUluXWOTjzFsNt40xWdXGJ16a k/dDIYw5g8f3vFcmIEe2evu01F+d0xpLmhpLmR8sI0KBrdjkHoV9YpD5NRGQ0Fh3llq9M62kKIyp yJqKl1C58Ofn9qnWXKmhbx91VCRhaNy3o2Pf/tf/8OjbITIiIcmMqWk/TNi8s2ej0i2MQXJ9ESDi QIJpmVrNywT4jn07NJ07iJTcqG/vnpBpFW39krZV11s5NmJqu3bM/MCGY8askw8twrYwB1nkB1nA setOnM5M69zWiDs7PmZwDFVmzXC5z1PVpFaEprYbrcSvnSvZ7jiSnvdIMzK9R7DuFhM9u0VwzS0u /HmXP9bcio0fwdEZB5EoSDx2RvzQ++QKZ1SaLzsjWuWMvR9wOvEDVyZ9V2dEELEI6uRzOGP3AF0n WbqWbzuzmLre0U9lI6Kksr3j0LTSrulbgwmyBvnSI5kblFhcNghWmMNxzLKTR1plHFm/9uu034Y5 tjIonlQGrKrLJnB4bqqszevfVA+H6SjcOPAJDuQGOGQZXIQDOPezVZ57LFPLK1x5Azi4e0ERirIk YzFacq3VcJBRbDXl8RX2TSLxHJ/KSMiZGcxX2XmL6gTXvdZ1v/JWPqnq0XnuyXVP3nRyXbredTHO gJ/rEqP4lvuTaREITC9oMS9zouM37dc9ilg9gWafkO3XXbNKa90e6mEuSC1Kouk8C2bU+svQGnor nh26sSLQ1ArFq8y8QWUCAvylPTzrvY12HPfe7AYERFdI+wkCbpf2psz3lfYQQcqTIO2DtP+scDhJ +3M4JBEmIzhE6+FA1XXSfn7hFmlP1btKewAEjWUSkcU55SDtg+veJO3hRWkPR64b3+C62ebS3hQx lfY0C9I+SPuAAA9pj85674hFfISAZD0C4neQ9vG7S3uFYpIFaR+k/WeFw0naT+AAbLGcwfgEBwhu gENy/1n7OHlvaY+w5jEhYdY+uO6m0h5dlPZo5LrwBtdVm0t7U8RU2scqSPsg7QMCPKQ9XpD2EK1H gID3l/amzA8g7VWQ9kHaf1Y4nKT9ORwEEtkIDjdE64lom1n7i73/scz3kvaeRQRpH1z3JmmPL0p7 PHLdG2LpRLK5tBfJTNqLJEj7IO0DAhYRoL/osm1+7Cfd5r33ONYero/JI5DanvQ6BHQ/ySAkcFGy zX5PZmVGszLRJJx/64k5MPxho94bU82YaRK/HfRBp0oX/EWr9GtVP6dunya3276E5x5DCjNsm5nb gWea67GdZcyzchmzhmx1dH3S4jyxO8j8tsrhOEnslhx+aKta76ov+uKeVM/dK+bxCX6hhmBNBQNu vHADB9yc/FUgzMkYN2y94iCRL24E1tIbN3RW5jlu7qI4Am4CtwK37sItdFkmTWZA14cmAukrk2ID odibW3BWZuBWkEkBNx8QN3gmkxAiEeFoIpPWh1MSCC/iZkQXBOApm4ArnfqPyuCszIAbX9yQjDI7 t/JlZ/NwpIe94q3umxr2TJ7zdqYCuwK7Lq0HPelir+uHi0O8RI7ZtT4OlEB8jVQ6reQOV64Fy/CZ lTlnF7jvjNKVd7F6PQhIl6tKVrt9oR0XsM27RDxTqAQwBDBcAIMoKvms+rnmMRiAzsAYDGT93A9I LoPh+A0kRBaxeHSBkvhKMJyKmKw22QvjoFBEKIzvkz/xPmAw7pIRG0Uiiuf0UO6Lw6NTNEPD8Fyf DXgIeHgdD/BcN2iYjcc85IYpluTyFMv0G3gSZGbaGL8WDwhhhSSk8cziSTfgodR3wMNSkrzVYx77 hL7vhYm3F2KGkMsI+ahLXedyAI/SX3Lp0hpaaYK8wrpwhii0jbe31RT8i7Z5SKkdlSVe6QgDFAMU 7wZFNJ8IwirJ4ASK6yeCQHJ5vWz6jbeGYhSgGKAYoBiguBqK+CIUJwPJ9TNMIBGLUDTCdMIXLjm8 fiA5HiaeLP6uoRjwFfAV8GVzOfeZxs/2QiMdnfBF1++FBom6LpZgfsFncQ+S2TyYmsyDmQsJuLCD YgN8nS5cFTv+AaKuAxwCHF6DQ1pWbZ69LMBh/W5rkGTbwkHGyDxnhGZlzuAQ8/tomwCHAIffBRx4 y4edGpeUgzjCAcbrN2tQkGyvHOzoZlbmBA5EgzgohwCHAAcvOMDr4LB+eZ2Cy7MibwcHeAEO4gwO 5kEGOAQ4BDh4wAFdB4f1i+sU3GHOwRJiVuYMDvDuSRoCHAIcPjkc8HVwWL/ITMHGcw7dZ6YcZnMO IkMwDCsCHAIcfOCQ8sO3a+CwfrGVvrIVaVM4dFuRJnCIw7AiwCHA4frVikbXeaUuw4Ekp9ywMFm/ lEmv2+vjHcMwK2IGh2NIf++5GEf9bqC7wYEylumNEkNiwQi1ZwCX+lt7ir+w7gy9zvc2hmhkMzmK Q5NmVZ1qLp+sQWONQWOO+QW46ThyZxO7lnVyYyRslkkM/M4e3wQ1porUHbU+qyJYVcGAmutQs1dZ cWieetT84+GhPpRlXj4+tLx57q9axiDsKUOem698r8AZw/rP6egaRFavuCICXKzYGcNOxIGA0hkO CFkGzMQgGuevtGWq8VnwkWZMwDsLHKQ0w1uF2DrblmFZk3IpddOkXUaG5qnOS/MbCw6HNOkVBoaI No/PerizY0PKhA1OIzY6DVJfW8KlzO3amTUDMren0vdc+i1UE9Ecw3HlwKqqfSCIQQYSu8E9K3ib NrpU6V/+/pd0ZyAx3J3f6/vQUOxeG1yGIvHMy83z6jsrzVk2srw6qa8B1MZTQsMbmJU5GfVBHFaa w6gvUOw9p5gMbF5fuZ7AZnUaYvRatps3g40pglA5zs6FyHyKycAmrFwH2ATYvC9sXl8Jn8BmdQwd ujI9zXrYnJ9lggjEc9iElfAAmwCbd4bN6yvrE9iwG2BzZdrQtbDpP8mszClsEAsr6wE2ATbvB5tv WVNUjxfyayGCoRrlikDrE+gYUXTl7oDTDWNCBFz0mNnv0azMMWyg1FTcT9l0ZapMJHKr/FruCZkm YV5h2ta8bGziUN3qlOfFD3ZVyUJnKWPpbCczRET0JsUhS/NKVaUe72YmYTdzoNiHpdhZGjBHMTmm 2OpgREOUK7cx+FJsxCgVSQlmZZ4ohqAwCiq+K8WOZQaKBYoFit2DYmd5u0jEBJhQbHXUpCHKlfst 1lOMYQOxeFbmRIupRIr7beMelxkoFigWKHYPip0l2jIUU+OUrGh9phpDlCuiAKYn911Psf77MBNc zMocaTEEgAHMXSkGKIhVjD6TFnMP8Zxi0tRQcPncDI2OerEswDHA8bPCUfGW9/Nts4GqnsDxhrhR dlXUQnZ21LHXdNssbpSdn9Fzp3Muhgt9oXIrOOqMqB45tpWKvHIn9RRFJYfWEab4A3M+LnPgJeaM h5XUM+Bz4v8LOfLtN1QWvy1zzo8hHE4OC8wJzPlYzPkcjEAXBm1TRtwQus2uOmL5jXXJ+RHLQZcE RnxMRvwxdQm+xJzJWOiGCG62KjHhjcyZhh5QqKgA73G019IcSGBOYM4fijmNbPJUP6VDuNNRc5gP x4AemYNvCOSmROn4DrGVYFYmPlqEA5fegTkbTU6/YWwllgoJ2+S71lDXVZ0+8VIVbu9vZPMPIOXV Xq1FAl+zCPwNfixEfAqXhhddmsYnl2Y3RDAm8O4ubcsMLh1c+o/r0uiiS4/OMyBsfS/NYHKdS0+e 2vUH2NkiMONZPCvz5NK/w7OaNjrATiYK29QaaeqaRt7qmre6z0Dkjm1ixijzW5oOyPgdIgPPkNH1 mf1GqQ4Z61UAo/QiMibfUNktyKASzZBBaUBGQEZAxmbIIJdUBkF8hIz14baMRpurDMrnyIgCMgIy AjI2Qwa9iAwiT8iI1odvMXgHZOj5wCQgIyAjIGM7ZLDFgUm0PrKCQbw0MIH4toEJY/OBSZf+OCAj ICMgYxNkRMvIuGH6Ey8iA9+MjPn0Jw7ICMgIyHh7ZOwUTmueq9nAxC4aYoEidULG+tOYEIvEeSzl awOT40/UYgKUy7v2TmWiiUUmKL7v1mEUxZCxrTbdZRFTdhHUvT7brHCCTYvA3Kt5EsYyZ6aRT1od Cp22+U5Xh9YYhNrY475nDHBhW/u+1nteG2uVAVnurNkzFagXawhAFNuNhTt1iptS9i4h8AzC2iK0 i9hOlc2qB1bV7gPx6g8V17VTyPEPXuSfOh04RZL1EzNYJNfzb/BumKjF5wzIYOwMoLZMdB6QcUf+ 2XjcSLLNzpRxT6jnH7TNSjoyKL/m+YaxJAGlAaV/aJTC76B0LCWT9WH5WKk1KJW+GQHBrMz3Rakb AiebolQmE5RKFFAaUBpQ+l4oBdeilK1Hqb6QeeEtUNoXgRBSWCMOZmWi2QUl3kGVooDSgNKA0j8C SslkgjPKpp8RSlcHa0EJLmwW//4EJ5EyXtRz0+/zSVpCW+Z0ghNyeT9V2uOOISSXPH4tSp1t0yQe danrXKaHcl8cHk/Hx0J7QOIiyD/BpClmCXHweq7Er1q2BveF5o3uXcBzQ2zgc+DzJ+Ozbl5KFwP3 +qGSFHjOv37LGnHIhtPCv2fYM3mRMWyr21ueinMuMzqyvDowx4jITDNPcY4TFl27ym7EOY9FEoNZ mXByAWmUvUf6gaU2tbZHcU/INOJvRfWYCtW0tfEw6U7btQvheCnVyqwTiEC/bH3WCcTcX1ijLHIZ L/d15Q4XPlmzfrZ08sz0RmkEI+utQ1sdlvph5Lc70thx6BjZAf5mPhZePzQOD0bdDMw6CxQWo03J DPTbkeDDn37+mwdcjOdDnHF4llENfTdQmKKILUqtuHdfM65PIsDotMxEHss0rcr8K+LvEcKz0Xp8 94RcsE0XvJLudmnGD4X14IQpK+Skn7p585G7SCxguFJOEqa/HfRBD8KQLGaumRhjCmubd2Wf73W6 rwqb5xfZm4z90BBhjaSxoyojdQojfq0mzyyrCF0KtppZItjBPXXV6TWv9Untl9EXYQjtuqLS7u1d lKmvvceCN+3DztCbP+oHI8A1b7Uy/b99cY1XLUjGhEXmv2x64qblbd60ubQxW8SGWDE/5kURj6V7 Niotqur5YHkl5AqlHDFNrY5X+64ddjb8jnBn2pDM1SYzb6r6mu6qQ2mtIfveI19jEoKuAR3vjLkR Cve8M8GwfeK7sjX3lpZVqr/t89o5SGYbtl8PEYFIdo+8yMvndM/bJ9OUCqvhmaOBXF7fn7VvQGww nzTjp7R5aU7+AoWwQyfmWcE3axMGLNDW7DiuG7PFzn565tRGPMmi7pX2bswj2zK4Z7qkiFBg3+jk YQlLTsj8nv0tTHil2caZBn3dBk1kMcz8zICExroz0+qdS49uozDtPcZL+Fz48/Ov1aEueaEuSZEY wkGKQLr+OEkgOdbcc88SipdPFeoNDKHC42GNK5POLkwDlu8kRRYddq0UcU/IDSXysrXEIbZBUL8e XgljxurkoR1Yb+Zuzg175sjfJK+aEsB50Lh+YFXtwuDkOiJMp1SmMx+JkPxEhPUndJiRgbqOCJ6y flrE4O+nMulkiZBweJ/T0obaGwppARY7uNVEyJKsE8e7SrlxhI2tF7ZXlV5TCx961gMRN3Hat1Or tKiTM54HKCGC1MQM8LcSsOKDlVcnVAUAaoSV9YFjFNOtJ1SNB08mVF2ZcIQVLM3fw4Tqp0RLmFD9 nHD5/iiGj0YxbHWAPxQsuvcoxpVJZxfCKCaMYj4AEd4weu1jw2Vf1e2O719ZrUHJES5sgIv/ag0x vaxpw2d55r8HFwISFC22w2kRSTy9wNVoBzYkipP4zsFFlKFkqyMLuydktUYln1OXf74LtWierXpx QUFQey1AEGSavW2q+T7dH5qndG/s5eVjmtV853ZbY2BnV7HwkjEfeRXIjJOQW6Vo5X5YBOpWBJDf YCnCCcb9KtBLc1xPcvOY2HNl6iOsA93L4kYnFmxilXBEM/deOnfb80fnFZw5W15DBUSoRnbJITPf SR9166zZJmMXx6jfMgGJtFv4yfJCG5qnZdUbg7F1CcQ9Q+ZYnLl1JPvcjJfVNgSv+6qtoDWJ/aZc gOpOzfii6zx7SXNj2J6WQeyDi70aNQERc6a6JZFS7ZpH99QiF0rpF6MOEsAtmAqLUMfRUjetdqdv IG8wERZjYFtd2uzzMnVGxdMPXd4Lz0NBIszRcJc9SzBy6P0oiyvGpD3suFcO/zPb4aFPw5IoWh3n YTruJMHXxnn0J87omCy7z+n7gGqtwbTMWI+kBMQJF/eZSp2XuZFy+LizFEYqJG6wk5tu9Lx2KHa9 qWcYgqTQLTqXhtYW/sOSLIu86bNRxxIjJG0dv9Z5q+3h0GleVqrrW9yOcD/9QAR0C/eCF7yUOlV5 3b50vVVq0+cU+c6Uo9KydtNcyilF3x6HSDaKbLc9jz2OOtO1Mezuww137SOmKnpv2SKxsuJMHupa l22adY3UQtXGHSz78jTW3cjPxMmAw/54q67bhtDPEmUx66f7j4YiYN849+tgMUWKDqeCu3fBjQcN Jp0b+sXzG3uCvmaPcm+DyPSNQgz6uJQuYuBLZ/GL3WOGVkThRATE8HI8SRaviCfZxrmNyEB9eE9Z tVb+1Lo5uNbHsPW9yHddRqhIOt9ry8bBMVP8ZVAIfhLobY2xmKOku9XZ+xXu/UJfcxqgXgQdLX20 GJORCPre7iwYxXidDIIIcoyFvjCB0n8DRFJnCTg/EPh6GWQkG44YmJWZjCZQzNBf3Gl3luddrD+D 5y2nJTaYj7E7oSg/3wmV6m+yODT5l+OkB/ULyyUqw9axDmX+rbOYVXW6125JPbGM436D2W32RhFN XEpBV031WPPdaMRnwOU2EnlpCzO05RgcX9KoT1P2MXqKIBYT14CGPu1oq9MCXvv6tnmEmCrmdr9J vk+7EdsPw7ZDL4GBUCw1Os5v9XYSYvV44hsT/oZT61t02CwW0nrdl5EqE/aReXovSwDR4+7r4/Ve do7i1c0aHJ+m/2OwOnkiRBnVZ9FO399bTFUiPJMnAoCnZQ5ZGcYWf0e9FxXy/9m7kuXGkST7KzrO Zbpjw9aHukxf2mwObV0fAItVwiS3JClVqb5+wh0ACZCUgIBAilKGyqyyikl5BBZ//tzDFwOe8u5F l89m4/VIv7jtegl6mRkM0YVB48zmMFFcumZ/iLbHkmBXJwyETo6dMeqfaM1oszm4cRhbAK9GsM8O sfNEp3XwX2+h+KKEqbuHa8XSCZqEWWenHGVNmBNPj4nF89mwBzBTroVTlojOZsi0rVzlNHS70dXa 604Lh0f4ELYQmnTgcHqrBVqo9+dy57nwd5o/aFf/U3/KBm9R+/32n5M1u3O5OeE0vw0c9hDGW/6r kfn5WEUcyB3zsN4EBzrMlej03HFCs/fBoeVKgSzjHbKFa/bAQUhKbnfgEcEhgsOXB4eV2x39qH4G eEIp6YLD9CRNJvh5G8/u4AqasCTt6XZqcx3oSFHC+msacy7x+zhSv0RhyT07PNZwvGugQl6AZZix EobKM0GMNXWCT7MVMm0nnw0wgd2AumWkeXIRYK7IPmBNAJiL34gAEwEmAszdAczkpmDEFLcHGL9m BJgIMBFgvg7ATC6SpSTLbg0wsGYEmAgwEWC+DsBMHh5KHL8cg7kiwMCaEWAiwESA+ToAk04GmJy5 mwOMXzMCTASYCDD3CTDtEbPtAEw2GWAKocYBzOFXONGO5B8BGL9m7xSp+8G3AJgICxEWPot3dGFh cpYukWJE5ISdd+0YfIV6Oiu6E4BgzU/kHYwQE3nHtwYYf9cMXmZZmvUfq7ZPMbZ0LsKK0SJUjYGq JkmOvcFgui7S5AxaDxvFrZPkcM1ekhyVHstiktzFtysmycUkubfBgbfg0CgTYyx1DLoVH8CBTc6g 9Yo6kF5/FXA4Ta/PRXHrZmMRHCI4fG1w+BtOUTGXmIPIO+DQHA9PKKSnlo+bGzUfOMCanblR/W9c GRwCr+I+Cunvtb/f95/yJCZMeZpjmlJGtHlj0JDKJg0amnsQksO+bme70/B2h3ZO0qmAvUn/nF6g vrNp1SfhQk1gb6u77SE4+6yn1NVDgZfytVxvLLzgCcqhQe8aNG/CWAU2b2rDAdCbI7DVDc09TNZ4 APfpMCdPcBCWpWGcKJU6SdGobo19aS8QzaoIayORZsw1fZb8k9y0onCwHQ+c6WwMvva7/XalN68l VJeXzzuMYHFEvtB2TXMNs7piR4SlR0a7vRhr5YqRDg0pyDQaAqMki0vjKzuFPDhbUh2+wfwH3IRx hJxkx+Hq9ZpAQ/qbEJ8wymGwV02Mtf7iPKSYwkNmtjf322SNSydS9OK8ILT6e2gUSTHgHujoykLB K2BXSLHLp22rNSl09knD8J1Lk8AjaKTAzM8tPABX9/YL5FxfwVhsKv3jeTPKWMB5PdqM4B64ko70 WTvWY5SxeEtAvWbPWOAmbmMs+ptwQ+zn+53XfzOIv5lEkiMslrUsz+71erlEYRx0PRD/GTHYgPsg DuRAEzAR2EtdZgYuFFsg+atFn0OqBTafBZBNwuQJV2QQPcWeeJ6Z215TPJbj0KrAGMmsTfGiIY6G +OaG+Ofy8eCznRriRHSDx0U+2RBbetlreyeyy0XC0kEH/rAE15z1DbFfs2OIKSNK8OITgsceWK5j iOs79FvdaXq/lat6mDT0EodoCoQTmQ5sZBdN+7cy7YJohSalnbtSd9TNMTzGw8JjwqV4HPDXemUB F/eVNwMaYBs766VhB2qzDoRJucS2o1r697YRuLWuQpS0QD3oIBvuqxaT9ekoqJaptqzcOTBTxq5q Q2XroGCYOsxs37nXL2O7m4SIvf8Pv8lU4H/o9tUOM4OcEybbHvbV8woaDoJBxWBvaHd97fhvnW74 AFF+axgdR1MoA09i56MOs842YKlJFQDUptrYslrrPbx8KfCGNGyIw5egDbvd01uJtZwaecxWK9r2 txNm2LiLZ85v59vXv1Lk42kDfl+Jkw/u4cz5arQhnjlHtvDmsZUnjACICyu39eEczBrL0PeJvOPq Bv0+T1h5nmtQcFXtUcEP+g2KWQQOZ5vNtnkUc6w+dN+UerHG9tgJ0LLBTL6bGclq9be9229Oh9s3 iVlOHY1kW/U6wUhapW6bmFWveRdGcvgq7sNICkFyC646tJtv7RrONqSB1ugXsGtZJvM6p8mUi/X6 BwaQVZ0dFTiOwmrazI86CEInhsrALcVUrZiqFVO1vmGq1tfwtiHDG6ivuUQkurPmiySZmuFNhBtJ JJpfwX7HqecBYUt0R8vDBx0iwW5JJLpXYYrBTs/3QSSit/1ZmVUsPLNKEMvxvnjvt9TrzWtp5F7i MLbDGG9Ad2oDySTk6zVkEoqG6wNsWiChdEGY7Hlp7tL2ALs+9u/I5ASfYZjEjHPVSjyKqg/EQ+cC zjlk/EolTzNzAEENw7lzK7tHKlatdhbPoCmaNxpm3+Ye1H6/c99jmOIj7KIyng6b81i+8r7PkV2k 0xO3eZZZEcAuGKOZ4nbY72i+f0EArGlcN3FbqFzlN0vcZirLgOWIK4Yp7vfAPtcppoGBDlmw5u3w eCodUpXArLL5x8V55UyaDW735a56hMteyh04vQnI03czbXJppFm+cdzmVdQcVTSj0x0AlY07bmtX x18Z4QC8peMHAV0lpv6xmBtGElkbm/n1VPR78fY3JBKjQCn3Xq/L7c+yyWWox8eHXaTfHAFmtt++ 1vOkD9mtCTxIHphreIdxztlSd5lAH2BpDpkQlCWYQ+nB6lNDbHmRiaK+V23UiMC7wIazBk7kFDkw V7X4cRLMSsIu0AtSyZkgZsMlpRlVtRJCntL2EELU6GSGZUt/jbiY3q5Xb2WhyKQ4tncr8g+YRQ8g aixzPeq2GzxLPizhfSBJNDlZM80vf6NrxK57wJaTVA8dr08tOcQ7BHp4hCyMhwVmXd1v+rc3/CLL a6RZydV6t7AWrlKAfc3CUsL8vtCItfvqykswv5GFmTIucyc7AgHun9t+JIGt57i0puXRvQvV4ekZ V8SKR7vfv55jRR1D1+mBQjNCJnZJgYNx73HKsMP4xDFCJ1PoZs3kcw7jO0+O5Xzw8GxyeXJeD0Rv 5qCXer3arT2T3lmc2a7wCEkHqfo9x+W96RKYLYDp/PpJYiWWwVTSMElXyBGTLOGQNOuVqT1ZzID9 qjB6keYaY7EvbteKwfihCRXjLGl9+KZLFNzw9G4891PYyVz/pwM7E4/uIGeVXIKdo+fOqPcb8x7s wAcfgJ16zQg7EXYi7HwF2OmxHaW6sJNOZTvEMhrKdjJjR7CdZgkYQZjkrFvajmueww6PsBNhJ8LO 3cHOO2wnm+xkEW5CYSclKgR2aGFykpOTNc9hh0bYuSXszNyoIqJYRLGPkqfpxY3SFaEoVvDsY6Ei WDP6bJE8Rdj5CrDzDnkqJsOOoSqcPKWh5En22pHBmpE8BcLOzOkG94xiMxO7CIrfCBRPG028DYoM +1NBv4nwrsLqSsMNDqCIbYll3l+zzXxrNJQZJm8cyEoYS4trNYrE6+mXl5AJ1SVzJSXNC4BcEoO1 eqDIzdUlkNiaBuY3fascvMu1M3xC7QzPtalLGPH1ed49lRu7MtXqsXRbucSmDZlBhJFBmVsz9qaY sXxBkNxlqsMgdnWeIlZtFmGWntlMZY3eHRtcFgrrM4rAeo+7bnB5lYKeK+SgztloihdOZ6wBnoWp dro0dmuh5x92m8oCc4DnK3UpiGhIwOEtuTMyoeTu6UK+YPsjj2SCQ6ULcopgMiELavMLZCJv3B5G ZCFJmvR8E8hSHhnYobiEMfnJmrxbWOv9LXfB9F+BTARSoske1nx53HO3x/FP830QKiaAkH9HGlrQ WHLJACtk2BQ1cDeylqW0qdhY2hLaK+8a5TfYtPFi+Y1K7qr8Zisrs19vTnGjKb+xndzBhD808BE8 Yc3z0+Bu9UJ8oFt9vWavSS5u4qbd6ptNxG71X53630pioVIkkfAEyk0FpQ5QI72Xq30lkXTkYBNk 2JXPV4wzZ8HRnKKkkLXT07bk2ddQy7DvfVjxy8yVUPc5gGD29n4yyQg5CUsmVoaHJWecKTsbQfgS 9ULvda1VTHeCiYmYGkykqSiCutZ+2I7Xa8ZgYgwC3gcTmC8ISAp/f+FpQSeZHfT6boq7XHiALQYU Y0AxBhRjQPGKAcX/PDxsn1crDylI2ZtPaz6RPjS0Yhyf+NPt1LMzF5lKoXQnUpk2ibTs4b/+97/D Ig6FKKw8jeCJtxt+HMjM6CU4cdJ2K5txzeRc4k0jDiwREIy9VqTyFximyhMmCsCM5j0FrEjQWQ6i YyAGXbWjGBIuBVp/pQA5P2p/CGACetZnYeBPJJfA5fRTtfA+VQVuo0SKGRZdTgneGbeQe7SW5b/+ /a+yiWtiP73AWWl5zvLexZFJ11bYk2sjY65tGKZ2ryt92uG8bsBAWScwmuaTYUqTxKYXYKr5Bhx/ 5Ir1p4fxwitPwJnNUcBxTXoGU8ktYOpkzSs1YKjvEKjeYv1YKrPbb/37qoG5UowIhY1C+DVAL6NZ M3AJ33rw9uAKaRYWc/FyWN6XQ8LF3Bfs3TVMmU7U5+Hhnx0mkKbK2vwIUxmfllnrIcO4ImHvjaaH b4jseNpCnfKPPB+fRMaE1rnITySakzVZnt2QTXnvxjBNB/uwT4WplCayDsw/2pXdVrp8Xm0Wz4/e R3mptG09WhY4acW7UKA6b8qEcDELmxnnvMgMZe7rIDuMq/HvMPaqx8NbkYXp1PzZpy5nFPa4BElb +/PZ7vDkG4dRJEUY5l/FW+YpEdjRp300J1ul9aGMC0PulEmGgaRntaz2parWrXVSgd26vIUq8HX0 MkpIfm4HCGDeMw0LYHDhqJK1JSibvVVr4Iz1DjEkyYJ6BXmRiSouiAT7mWNgP8xWEa5FIw6zJaAP PSa4v9RtiRMCRidJwupWEu8RtxMGX5a1e9/OiAB5JszFF7Lgtj7y6UqCuxcWPxX+kcK+UExZD5Rs Z0TRrJ4REfbGwNbE2dbIlK25tnPdQRJsi6GswNeYZkofN6UkdHOuVo+eblWrtYEdaoAErsIGTwiX ZvVJHjxYQEIFQUzYJp4Mhg56yL2Fq4NJK13uVL05fBYa2nsxFRYvzBnDooXjRR8E4ggWE1YdIBRN gaoquZArbWHkJpwDwHMpPZu2i8prnzXlCsddEhYOEExkQqcdNMQZmt5VdnbrBR+iYhZ10IQhRZLm aePDHyRlBKyp5GExP1E45RrTV6dUbddL/y/rbwGiNkfoETrMr0+8Lw2q87yCWS0v3oeGa03x+DsJ Os/gCTOqxRu8h93QM76aoc0Wc4FAAUc28HIewtgwlYYOl0vdwpLOMwXGU3JCk/bc+i+7Rcuk0fGx gRNgcqXaqpX2hkmM+4dB4ReIDPddjnciw1lg26dTX6YbclHMkY7kyS0WuPa27tyXeScXDX0ZG1AQ A74MEXl/zW/uy7iMI59/kiuz6HgHVOCxDRVpkBrwNCs0MALtfpZwOnpkycxgDIeHcT3n7WnxhgeD SUeBDgy4G6a44G4YFd2N6G5EdyO6G9HdiO7G/bgbSZKcuhscUxeiuxHdjbtxN4T4BxUnf/376+4/ Px/8n0/rPx5+B5vZ/wXG35V39teDP6Btb/3drlILj+6DS3RcoePPv//nIGbvIdP/uanMg5P7J7ut D4YeXtfPq0f/f+uFsduBVapVtT8K751QE2OPIwLQT2r+59xdurh/fKrgNYhUXpqc895wK/yVXiLN xSXeFxA6g/vNq2i/H5zyWyv9wBKX3aWLm5kAKINywk6UB8UFnCgPyxqb+DtC0qjE3xFXF3jmNPwY Oc1z3tjCyv7pOdpqDVUj/j+B99Xzfwdf075IqShrcxdqDwG4QKXx/zcSz/A4JgmbIMnMM0qkFn+t V7Xv4QlFpTHuCJa2N85gxEMZWVU0LGnkiJ8RWxocZT5CRtgo8xECQ0aZj9nf6FHmYxQrJMf45tea KipBFfSmXNk/8JX1kiywExeGa2MTd4cljeU5E3hJk15ySDg7rpqIImWkY87ZwZyfJZy9Z84p44oV jg3mxTapHUfcJMXoJaiwtpdwBmtac5b89W4Fz5zmvEnO9exbDy0x0ZyHFQ0Oi+N5mmbHhCNspIWA yMIAcVT+07BJgtABiJF6U5XGWzrPO0swImv1f7UqYT6UDrzGxIn+NZJplzicVjVBGZfV41YCL/g7 OefWUicdZWzSqi5lfw4qoxmXpD5FGRs9kElPD3DN8yT1WyojUYXLnC2GPIT7UMb6BqKRXOGbjylU GL5MwwgxE0ZDQvjh1SoP+YS0YBgPHXSaTgTqehThuUAyQdyYnMtbYc6YnMvPA4cfu7XbV9ufpg8O dRNlmRcdcBAfAAfnztK0hytYeuXWIxzvbhYnrGkvjOK7reM9aonJ4KCzHDjl4RE2r0QR6ErqOurf FcPlJDlzbCeq7ljV/UPu9ZNZP6LmnrfJ6pLsZLrqcu4uKOZ7dr3uexC2BCP9D6w6XbPH47+46o7v DDEiFMLTBPSlfRsaKYEG3UtRrC8l4ZPEzLCZCAHh1J4OUfv0AxCQ3Z7a+zU/m9pbmfJI7SO1/6rg cKT2PXBoqH1TkF6DwzvF6UPgkJgp1J4Mau5F639YM1L7SO2/seoeqD29SO1pR3XfKdgeVF13dWrv l+hT+8RFah+pfYSAAGrPzqm9y00HAorpEJB/ArXPI7WP1D6CwzzU/gI4UN0BB0o+AA7F7aP2eRGp faT231h1D9SeXaT2rKO69AOqa65O7f0SfWqfm0jtI7WPEBBA7fkhO64pri2KTHcO7iibDgGKjqD2 OZuV2vs1P5faM5JnKibkRGr/ZcHhSO35Bb9fqg44fCBbT2W3p/Yqi9Q+UvtvrLoHas8vUnveUd0P 5NKp4urUXhUn1F4VkdpHah8hYBAC7Itd7Xd/b4Jup9Y77aTT0uk5eYImYEnHQUD9K46mOQ2tY6Un a2Yna7Ke93A1CGiXa//iStabJzaFpgo/n+2zLY1dyFdoprDe/iixpFJC2ZcKK+kQLsFmCi9LqPjy noKpZ5/Bq8YDX/vEpVCfCRuy2yMbl9CRw4UVy03pLDDqBmLbmtMdkikbjIATBDj0DcDpugvpdM4h slDAUYxn6Sg0qL8v/V77Avya54BzE84RAScCTgSc9wGHjQCc6VmFRIcxHGgJWEjyIYYDa0bAiYAT AecuAYdfBpysG1WZngspKA0FHM0yNtyk4vB9kab+X+RkzQg4gYAjUpI7AByzbhox6oWVrT7zLCJO RJwPH8E82cXGbh9OEKcunEh6QZzpqZeC8nGIU/+RHRBiLBy0PydrniIOuW0Qp/251hEM0djKSa+X m4VFJlLPFA/sWhKBIQLDBWDAvrWmCe/+3nUevF51EzfEB4ItdgQw9Duucy3F8H1ulmD+J7Umz0/W PAID77Gd2wLDUOOpqV0K8Q79Bt0Fd1Vpqt0GDkRKvcQMCYsZEmzwdLsnMXw41bBMxxIKL24ja7eQ L9iHGKeVF0HtvTxA1J2h1eJHuzcAi1YVAg+BIyBGQHwbEOkl34wp1wHED5SY2iKEKY3mGF1fzmpB 85M1vzcg3jl8RcSJiPMe4rA+BavjMVR0KdgHKmOtuog4h28wr8tZmvc+CECcZol+jo3/IOtOP1ey aKeI3hZxruSbRUIS4eFm8MAvwkPPQ/tA1aw1gcfhTOQkTcbDQwMnJ2ueh25uloJXp9YTKlN3pdT6 SSOEI05EnJiGE9AA+DCi7yTEy46jxh8ekg8U0LqRh0qBBvjk+30a4WiPRnjgSGl+4yz7UQnHd5Cq G8EhgsNb4FCu1vvKvQ6AwwdKdN3IFJdZwSE5AQfH01sPo4ngEMHhC4OD3Ms2vf8Sc9AHcKD59Ax/ EnY4PBUceh4GqQ+Hu+DA3K3r8yI4RHD44uBAx4HD9ANiQkdm488KDtkZOOTRrYjgEMEhCBzYOHD4 QOY8vdJh6bvgUJyBA71Q6h/BIYJDBIe3wYGPA4fp55qEjjy4mBUcTB8cCFU3b/sRwSGCw9cGh1I+ /zkGHKafahJ2k9OKPjgwegoO0kW3IoJDBIfxpxU7u63W5jI4iEwewaGYfpTp78EkcBicjt5fgpOT D/qqzHnW1LPcDBySNHX2WikPngglMDh2BfPWD2kOoM40aCi0F5Rk0P5PPe9Kt96WVuonEOilpTgQ PWwWvM0zHGiLb9ZRjZmC1oSchA2svgrU+C0mOJ/7ZItk0gYj1IyDmo1xi+fdUwM1/zwpxdXHsSSM T+Yh/iVLijQsuwp/ZXBm88nvyxMBZ/0Qb5Xu3W6C50JcLfnSWekg+XJpyj+21d7CCPotzqBX8GKQ 0J6A19BpRyiW9i5B1Nb+fLY72CH2PkyKsNJekTMBZGlpl5v1elH627bWoNqQQ+aCZPGUCEU6qfIn +6NKwy10QT3VeMokgwaru2e1rPalqtZelEqhIDoM+4WjSLr+dLuyEVat7co0IqmFBxF2wcIlJr8g EiANoFEHwRFPCNeiEbeRj/ju+TdQr1cvFt/BhIDUJAkrwEq8cU4asS/L+rUG8e0uTWAnC1lwi/mH PUlw9wZZzIkgY2BftZ7VReogDSohaIbdfWWYhYKtibOtkeCtzdrmUNbZ3GZ93BRcIcNthV1fTnMt j9enpP5R7qrV48KW1WptMPfTwCPlKqyOl7mCg+Yauyj31dL7JLvXFcAAamzQ6ybSjOAF7/zzNM9+ ayBw/QzvL3ZYDnymOTUOCYzfULlT9YXiK6IF3EH1/+xd23LjOJL9lXrciI2YwZ3kS7/Py+7G7gcw AAJwaazbUHJ113z9IpOkRNJyUaBFWXahZ6K7Wi0lABI4eTKRlzh7KWesMN2mwwd4EphZeLtRi2XC 8Ax2HEh6qncv21ZttOuFd1xF7heTFQDuLYfoGImLUzzC5Fjn+iyFcjNLDnNvr2/G8hKJvAmJ5H0S KWbbq4HQGW3vmzOIY+Zk9IG6mzOrN6ulSOQiIfrcs8pnSK+OpQ708QcQhLDfQF54iH+HoxS3/xmn VCPse/2yPl7kpnHUdJFsyZzR7DXdFbRAvjtpz9yBkT8285XUI00tgQGaFx9Iqt7v1z9h3yiQx+JK nXPJAlNtGWUjr2eFwFtRcQyVF4E0gLz17qk0NuxqfSwrA08OpsejXjEPqiGnnbSWxwStD4Jc5LxM 4dRJEpJxpCEBFkDhwNxoFfnoiPGu9+hqt8biK8iMoqgMh/AB1r1WmKHf1XjIJOy2uA4bXGRamM6U CWsMmw6mhSycTdW9Gi1RVZy2osL/WzP6Ze9qeKGQ4sziymorTWVxooAgCOiaAvpnYunMbciVILD9 /zTl86mQH2v8A3Fo9Ghkjzs6XheZsayHInlUEXzGfh1A5QAG/z/+5x8BoQ9dRq2ITLh/ZNL4fPhT 720Xd/1/A5KV+15eJxOzUzIY9BW/1BVl8A2bj+4OCzZpKYw4Jh1JHIRKNeTljqSRsoBtdsnWCuEJ wenTQdnD5URbm6shJTns/Dhdc3t6x4Qz2Iep2WYANATwXES6G5a5VRVOo945TY7MmlqCro+ELnqC rg4FYFjNSd/end3tjUFf9asaOvUOvQg2xqTFMhzCDs1bb893wmwAbneArvYDXeml7N0ECwkWlovp Wu0GeWSnc8WFtlr0Gc3sPm/MqPvkkdHRmMNCNjJnKeDzrYipFNOVwOYuYPNmXtqQg8zuTBcO/n3y 0uhozCHYKKlS6kkCm8cCm4cHhzfz0obgMDujPZzL++Sl0dGYY3AgqdxFAofHAoffjom8mec2BBv1 DrC5T54bHY05AhtRJLMngU0Cmw8Dm7/8Yb17Ovdr+AXYzM64Z8bcJ2+OjsYcgw1LeXMJbBLYfDjY 0GmwmZ3BHw7+XaoGDrNrjByBjeFZMqMS2CSw+XCwYRfAxlTc98DmHZl45kqfzXnBXAhDJ0/MaQhi TGaGQdRm6LOhSpM7OnT7Yy4DNs0TasMIj7XeHqAxpju6Uq8g+pIj6ERGJlJ2CnJs4lbtbuv63RVE 6q6QUOxhUexVtwaRKev7d+Dzq7AHRLnczGXw79B3d4Rq15cuwCE4GX1wRjHGmGcq/5D2UYs1c7lZ vHcCmwQ2dwIbq4+69QYNQojzwg3A5h0hxGaW5zmfzN94hVbDMUf2GWP3vQNv5zUVTT+bMjkvbAs2 bWY7dhNvc/pxdyQzLWHO42IOvYQ5/bQF+Y7Y32pWkN/7MKcaB/lVNLvvbVfCnIQ5CXN+gTnsglE1 xJx3BBZXs/zQ78QcOaoHh7H/CXMS5iTMeRDM4ZcwZ2BbvSO+uLoivpjl7LaYk71uu3nfHKeEOQlz Eua8/s+H6rAq3ffyjeAedcIc/o6wZUGly+983w5j8pNE2uHS13Ee3/C+nVeWGdjyzW6o611dftdb u8aiHBnUvmU2rhxRkCjoWxJJvMDHgohPcaQvh9CIcylrrubH6wlt7n+ktUlHOh3p3/hIXwpU0bxy pyMt1HwtLW123ZEePLVi+p6yP4TWPh+NeT7S/MIQ9zrSUxfVc6sf4BP6Nfcuork3rwrb1rLFrbE6 uhrqjzWV9KD+lYTyYyouACZBxheEDH4RMjzpQcZ8FhAo9vKQ4UaQ4XiCjAQZCTIWgwwxggz8q8sk bCBjfjisdPYiZAy+Yf27IMOpMWTYBBkJMhJkLAYZchIysvnhYNLTKcjw9L2QMTZMwpgJMhJkJMhY CjLUJcNEyD5kzI+skP4yyxidrXdBhh+zDJ9YRoKMBBnLQUZ2ETKyvAcZ73B/er88ZLxiGT5BRoKM BBk3h4yN5WWtV7YzTDI//OsMGfn8OgKK29dxTb9ushl+YieTYl8HYw3HZH2J0kjD75vaC/9Glip1 z32msPMfvj7YVryAfkhcR21PoZRHMRc6zWE3x8hOczzXBnb7vnZ7XQdpuxJq8YM0aIMio7BGECZ5 20j0dHosrJKSyCCsJUK7BJjuajQ9Mmt2Ca+uxSuGeNV4Sf7327f6ZbtdbZ++HfXhuQGoBq6KSD/K xtKe4BN3ahrFGSZ7kucHlfLMXA+E3TGnxXR3QE4MpPzS9p9kNCb7mG7DffA1arHG5viEWiCksF+r ChKDbVz84Q2DShKmJkz9VJhKBtD3Ngcs1HzoK67ggB2Fi4G+LBgHinnf/HMIfYUdQ58gwtwR+iAn IavUZK/GBH0J+hL0fQT0iYH5O2B9Prc96Jt/lc+NjzJ/oYKLqNi0+dsbIhdiCH1hzIH5G74h72f+ tvCEjWwXgr5FmuQmkzph4G+JgQ7awgK+jVukE3HCQElmxybQ3JtKRrkA4Sf26jyIgJmOKToWUF3I rLgr/Ust0mHr3r5Fus8sMlbsZnzum61yuH/IaFzmISOWWXyMOLEXOFkc1IeIuxlZ6HWE98EbRLS7 rhW3wBbmeRHVAxqwX+YjcPUsHl0TSt8dpf/yB/PibeviHDPVXqENSWYX96EFNU5H3u2Gw2MmX/pg iC4E7TymHBjMQtP7FPfplgOqIkxjqcae+ITAutaH57L+V7neVZAizS3srsnCj0NmmZH2pvQVs8Te xpHMkvlMG2SWu8odDj1psGWjsIVLJjAdvN2oAMYSESGyuiwTzA7EkHgpCVdicAVUir2IK4r4Hq7M DjOjRWGdirz34EXY7BFDsGrg/MMx6cgCztuokjvhSgcvC7E/fEJwVNa7p9LYwNH0sawM0AwgB3yq iMfngZaMZm35V9yrXeAKzeJyfYMcPIg9OSReTAKXq8HlJRDfjrK8CkjT+TnFXpE2uY5++4//+u8I cAl2H6m4pvm4Ds+4MM/YEGT55HPuoJARxThRcjhmUQ2xRIh7khZw9zV/LQMut/dhmQLgQFuLzqsy WGwvrnNhicmqSQNhynIHFtE+2KTlfreGSvbMoj0UJSfjjoG9BmaVW7sKFsc9IIuQUbyMZYIjFJc4 ndY7h1ZtXCn89xjKa304ftsErNVP7lvt9k4fnQ3aGl7cIWoWwisDAPdvqOt/OOrj6nBcVRAvKMC8 VnEIlWU6b6xiG6jv7hnNa1PNsBYz5SQ4Juy+2YeNjLge9MoZNIHL0oc3tfuz3OxetiCNwXvPYoVV lDQb6LQyhb5UHbkyozg88c32GNZWbnel+2u/qvGAeNjYcXguiaFgnv/YBFH1Rq/LfdgUoDZROUTO TQS0AeWwq8Nu+Hk4nxVqwBkUUCnOS2IKRvENvOzLVmehXySP89rQPIBa8/BhdSWeGNijHIRlKlap GwqrPHmy+xgFLpLIphZMF+hEsydnvc5gh+nIkl+ZkMSjo6v34A0gMFWRXq5bO+EYKWTumrkd3aas NCJxBpPLp/Bz4j8//3P3Um/12l5iDhk5hWNRyWbHJFBTFPHujjyfVuvNz7o49b4VgmPK0QfDnLw7 mSWTp3a2uwOeEDL/1fYIjg4BG0LGqXhrghigtd0+gGOo8UqPR3aXWaSonzXEkdH8yKzZPZAt8VtV 9Pu1Q1VXPYSZXfgD4Oo6hBl/EDXEwKGKY36wQ7W1sLKlbJMHdlUkL2gClwmvas7YGVz47GhyUnmz qFeVQfB45fpeVRyT9sCFBlOcJK/qp4Sq5FX9nODya9soL87gIuYH7FBh724bwZhy9EGyjZJt9AiI 8FvRl/2uPm70/mQbvZURQlWHMLH3NkAlSBY28qv+CaMPZNY7rCK8g2nv5XCIIh9+oG3/g1wrfd9O UuEDxYqlsoKbJwSEY1c9l9hXodmth2egMBjOTF1c0AlTKoOtutqX+5fD93If5K22T6Wv9QarCHAC nnhuoriMYJk1RSO2C5d7cscg9KndxXlcTNoD31gFY47hjcqx2ncXVs3tBYuz6DJecN7eWP08nO6+ 0OXKI2/RHuHO6l4SF+rssYhUoZn0+F6a4wuXG3DKtEJZUfYHE9IxuNbw4TtwvLqLIAYXeTLuKkJk Di+p/GrtgnYot7tWGM3hSDAdmYigco93XvDcwimrIbGh+SpMEETyyPBW23SX+RHAxP8sV0Ewxspi qGzUphYkUyiquXbZ2s3hCZ9ahkklcdl1pCAagAni/krE5a07HB12qWHRwCRUzgnsuvKwX20xmLA0 3/+zqQ8T2Twn45p1q2yxBHsq87hr0AXvgYJIaNrdMpFRcoI3Z1snkzMjSIKZoaiUZJKJnLy02emD KyNICA6hRD4cU5jxmIX6AEfKYvVJnPZtotGf9eqIkQQ16Bhn0Jf5AKYK94R6YPUbEHVOI+BgIMhi yn89gsecCYzTd5v9brcuu6ZWAjiXj5LFFRGG9JLbRvOjTfiEj3P4KKYZMJrDi9kEPmNWO7hKBnCM Y25ceIodraCLVytstWvbeXVsi8UtWHhp8wsi8d48iKviYgwk4ZVoxWFwAESxuLLabYNuOGIgBEiV cZZ3ECsx6wTE/tg027rVgThLG6vpC+6avJO+JDSVIwVZpL7NOat0YMAn2hCQEZVzHPuFqYlXUyNz puZF0cZBdJJgWqyp6xUlK6d5pc+TMhr0abBEAtdfbXcWqYjNUH9FKbAb5r+HOVrvu8Skg2kmhu+h ErDiySyDkTjG0Ao5L/gkMIOINxvH+YUJiieIM3qtt5Ur7ao+/my2Sgml4darcPKcLbdQeY0STA1y sSxRVKoHXcAWS/PivatdqwnQ74Xnz8ZB7AKmRsUtgHb1Utdueyx941MHHYCJblFrh9AmNGkxtKlb KlJtOpkDOEIalav2HvEkKCOgSXQcKeaSWWlaSfgu9Gp3Eom3BnHgH+QZ+ZY8qaMFssBnjelsWszm 0/ZHI/FHEFmxGVF+mSA5vRyz5vP4mLWFLL1gGLA2fHC7O4LJUrvDC+4+1WQHxV74GptVePaO2wO6 P7zVPztWHxlfd1NhKtesVQOj92vw/UZmbKrcEdYaLidJAhSdepwQtp7h8jrApG+6KD7bdCmMNe56 0+Vq0n8eghvGM0VGY/a9qpxUmGh9fqyf3HQBPUNgtx7rn021hVNWMAJcwPkP9Ewu4OKFEhNS//Gq xETp/qrWL4fVj5PfU8b5ZIX1HM7py3b1VyPR7+py77CyawGQqeP8WcsUnRBOYPVdnKZ9qvWm5/QJ OIjXO1FURRClOTm9pJ6KtPAYIzmVygVuoE5FnmQ11CKqyMlCuozllWMnz3SAPtgxBVBeXsT68W9G xpdZqsoNmiE/etzMAKuIPHSqIML1ldjj6TDwLvZ12JsXgdn88sCUC+leaah+35JgwuphcqcowjuI yA6FITIy/MCa1xK/jg67tdKRhmu0a39UjQ+7DoxaH2DP+ibcJ7aV+Q2v12RVYUoJTK52FZBBhuFV gn30XRaXlWpu2aoaMrJKaAN/WivmU1EZpwO98ZS19wkY+0HQ4cbjXgAzYTc2gO3+wru/LPOYAhXp /L9NfIU3jojeosiMJS0VDlHvq9UukIAOBU+owWTBzbn2Bu2Cr2aFilfeVfftFI1jDmCRifzu4RCN wl8MBW9HJqRX4CqCA+zqc90bDRdvkVldC91HS4d++fEMyZwJPlDs1UPHSrXgQC/muBfynKmWFWQ2 OGhvrwMHIYXghFMO+ejFdGlAwSWBaEyecR7+rkdjDsCBqsolcEjgkMAhFhzYFeDwjlBtTWcwh7D7 o4bwNB+NOQAHIxW5j/2UwCGBw1cCB34FOMwv6UdYMQMcWAw4hCGG4ABj9sGBWCdlAocEDgkcrgaH rT+cHa/D20MZDlgfHObX5cuqC72BAjiIQthgFeThfyJYEjZ8IxgIGZgVkgkvJvcP/rj5rhY0CBiM aS35likllWA+iM+DzUK/jueVMV/4xuW32Vl0u9ZdoF0VF9D8uNmjj+xydZbjU4MjBOGmDOPf4lD5 RhDjbBPL306FzJvJPQGmYx+6BzDzU9S1zy4CzCv2ccYHVYXXF3H6mSQ+M8MxAWAufvA1AAZX3MZU 7ruMDsFBk4rpsoYJYRLCPAKF6SPM7HbUUIjzIsL0jj/Pq3NLBhifySuO/9sQBWMOEQYmIe/bUhHH TBTmawJMeGoWl1mWdvfn9lQlEXCmiIuUTFB1Y2tLzbe2HH8Tqrr3Dn6F0U/4ZHjXUGK/DTYIGHKf phvB1yFDt3TF3K5RQwLQxNC+FuzNroFIeeZjYQ9+8i7YCwIS7CXYS7CXYO99sDc7qpl4Gcv28Cfv gT0QkGAvwV6CvQR78R7/PuzND2POzWW2N8aH0U/eBXtBAMDe8BtKfCXYS7CQYOGj2VBO5gcwmyk3 PeR4DVvYSwGfRMFCmO5wzIGbvpX4dWAh8Y4EMJ8AYOp99TfssmUv8Q5FegDTRhrMKIQgiL/QBXAq zpG+J30KxoQugBe/sTDAnD9o+mktAzC36tX1m9Rm/frdBMWMboK36NqXkco2PQTXq+1zudfH72E1 a7DXDdbBqmzcteiNG+5lxGNNzlezq2B3x1bQqpSAuenwnn5AynBXGA4LfsVN7HHrv966ryBTXhtQ pBv9s9ztHWxwiXJo1F570P6ESldSoXKurfvRLRDVs4ir/6Ey5tvosfAm950obKDK42qBqcCvYdsf jvW22v8sfb3blC8HZKEckS+2bNetmh0uWBNjE5DR1RftJWJ10bOXMjKPzjBGC3+pqfHJXsIBec5l 9w0GNTRpZief9/kZyZxkjAzH7JoaNwaUY1kmP6C3z2SRoWQv/eY8pJjDQ26sbx632B7XXihMbAmC UOsfsQIpGs2RuT+6wELEbosUu/xed6dGQW2nuEJoQZqV8ApaKaeK0L6p8RjJuT6DstivqueX/TXK Ioc/o86Itn1JVVxQFmxR2zeM+VG272BAlk9Ze1/P5/7FIP5uEkmOsFh2BR3LarfZoDAOZz0S/xmx 2Dyh7NeHhDJwIrIPhs4sLBSraoXVos2hzRqLEAPITlfxGMgTvsggoRSLGQZm7gbVDMN5+Xt0r5jb VjNMijgp4rsr4n9tnk4221gRF0L0FbGaq4iJ0Jettl9oSS4kU5MGfDsEY+BJzQe352HMntXGgh0H hSXv7YTmrCJTVezmOqHxCf3RVBw/1np7QISEmvLgTQF3Iqviuk3gbFuJq+POAHRThTs21rGdSEIi Cb/ymXqFlwv/3m2bJiSroFQqUAJY+lHFXfLdtDWY4hrL2DbtQhqBtfMrxFwHRIb6KM8iZ7opPwHH yq5qVh48KD3rto3ac42LMe5I3Jgt8HDGrOtPEvz/4Q9hkkrgH6pue8cpVc4J011nhNXLto0nVOg6 ju3ZUHn+R6/HAgBemBr62hGmdGSpi9sRkZt2zGDKKgMgtV/tXbnaVUfYfApYiIprDfIpSMjh8P2t UBuijeA9EpLPdh1nWXY9CTn9pMivJyH4fSNGH3xxb0C6wU6M4Y1LsEA/ARDXTtfNVR+0fcvQkoqz sisjRE+9Nw0dcrzkmw6Q/Sy849YK/THva3meV3DAzeqIB/x0vuFgFpFtOm+m2wKKYb9DvPit1jus 3y6Blk2WSrubklxt/3b0x729GKauaS9crMvFnqEkfXWlkhx/MNdl3oz5EEpyehWPoSSFILkDw//F nvUadrmlkdroN9BrgfTlTYSULde73TO6o5s2oSSyvYmraNuV7CQIjRiqI6eUAr9S4FcK/PqCgV+f w9qGuHOgvvYikRDsRCQKLucSCcbEpbv3XxGJ3IQtMI0M5yEIZ11p7dOYKe78KgZBncubhqzbtS5b PQQIyKIgJln/n44lXY4bY/FxY4I4js8lWONltdv/LK0+auwRuNr9cNjhHRvau0hyC9GILbmFtKbm ep4WSHB9lI4IPDn3qrueb+4rejI5wXcYJzHj3HQSz6Ka6/7YdpUE3eLgD29ah21dwHqscc3mnJ0l alzfmJMIahmHt7t1R6SGq+2haTxPUd3SOH1742d4a3Eq50iCy8N+tcU6CqX5HmThVk5uk56ghdnO ygZ6bl/fLQQAVWe2I+aHpVeKOnFNpOH5J4LlftreaL9/SUAY0/pegAMNbIKLu4WlB+tGa+0DQPyG 2f0srxQGucEZcqDNy1adU+2RqkTGzN2+v2I4nLKdYH0sD6snWPZGH8AIlyCvephuqhur7eaN6z9r epn2haRzE2GJsOa6YODBT4I9ERGDJDOTvxYw/MDd54j28R58Rcsc0Zv7oR75zH8tQ+CtMJ/bFX8i Wujmcqzz2x0b+GEYahvnpLlt63lgy9jZ/BdsuYhny48aSc0EGi0bewoloUxiSCvLPjTqiOVFJorm THVuNwJ7jU2HXYzkFDlQbbN+HnkDZdwCgyAjXwliLlrSLUuyfQq34v+zdyXLretI9le87+gojBxq 8TZdm4roRUXVBzAAArDVV5b8NNz3bn19I0FSIinJJChRllypxR1oOQGSyIOTiRzKzXp1KYjHqMwe d/Fk+i7uObPVMSk9TCfwIgaf0ic0AMZMvvx80m+1g8FtU3fxxw2i93xApFkFECu1Wm+X1gIMCthC 0rhQOD+vsLc182rLkyGuk8XtcFxlTrUEAkrvm+owkUV4uLKm4eudGy3jw1JmVPJXu9v9OlXy+uzA 0aOSZ5Nr1pBESA8YUXl7UlpCI2vldZQcxpRffXbAWMYHDw0nJ3mDbL8qNn4DUVvrqclqu/YEe2vf w/Ydjs7KKFV/ZP8/V1KEKImw+5ZvKrAwE0Jo4yTNEBunmOQQLOyVqTnGSYHG6rijiSQrg8/3p9s2 YoKf0sSKcZY0voK6/SA88ORhPAR92Eld99OCnYlHlgEC0jOwczyyZFQkqgs7cOFK2EkRdhB2EHae HXaSK2An/5TtMEpyknVDLv2FK2EnR9hB2EHYeXbYSa+AHT0AO5plXdiBC1fCjkbYQdhB2Hl22JmY hRkgwHwOO/6llV3YgQtXwo5B2EHYQdh5Cthp1BNGVsqRFuzkV8COi3cpp9fCjkPYQdhB2HlI2Pms 5oTR5FBzgnlVfqlLT8R3X+DnCl99FnQGEfBXdWGHMZugs1qlmGGK3wt2qlQcnqSmnCsuNNxPN7OD xCd23BhpuPLLJqsVsJ6VhFjQJDLC5ltFmZ1PN+FT0k1y/3xDuokHlC1U6qlDFFx0NL3gWWmqNKqw hPbbt+LDrsxi9Vq4jXoPNRxSE1BGRcUh3bBUxQ2zBwTJXKpb+/S2iuoLmWN53PbMbKrTWveO1TNz HdIj8sh0i4eunjlLPs0MEZu3rDvFc1emrAaxpVlsy8LYjYWCgqH4VBoZMXu7TJOciJoIHFbJgxEK rbZvbUJxyX3iCQWExgReEZ1WS+WI0BiWn+TZDiJ+dwia9S6c2jH0joSCMqYFF9ls/eJuGreLJgya MHdBnL4J8wniZJNNmMyOTOQ//Mr1Jowfs5M3EyZxv9S29mc2xLlNaD/aMGjDoA2DNgzaMEcQQxvm RjbMP19eNvvVykNKyAesrwZCwchLzSvGEYo/3VbvnTlQlcaIgD+M06QluQ7gndBMO8u4VWeaaZ8w jd7/x6b4UpIwR21JemPKU4lfUOF3puSgB87I5ZKJHDS9Xl2g4TLkGEWRKBATyNhRDImXAiU6EgCK H1XCHyg31LpN4yD7Rm2v/WRY1pkMmTSXWRpf++e8/bUq+5VMKzBIDGuBQTIVDMBpYZMxYNBaCDz3 i320p0QQbkgbDMKYlLS/kcnEfYXdMhMYVE8IVGW5fi202e42aleUwA9pSOqOKnn80NCS0rRuhxDW KlhCcIc0jaKtIIdlXTkkXgyCywxOEZZPc4pAqR1J73uuW43ZaWgUJnG/YiLHSQx2P0GnCDpFHsKR cUN3C3oy0JOBnoxa0tN5Mro2Rgbb5oEGcPpSs4FoGsCssdmEbgmD21l7tj0aAGPy9tkIF2n5nbol 3LhOOFO5qrfdeqdUDFRRuThdVEymDQtoqiiFMnqxfcLmKPUXGtadLfWnQS0fp9TfRi3Mbv1xVi2N U2lLLeVLrZ3RR5Y5cXc/svRjdo8sYRJ3Zef1JOZi5w9srH8zan0vibdrjXLTqoG3E3XbkoGPWdzv 5m3DlEwJ6QUISaumBAjd6A5nCTS6yV7+FHX5Pk10cJlpbbmTHWIiFXHNNcOWK6Oq68IQJuteaLX7 qC5Yfbf8qtrDL0k2V1wiOrKeypGF0T3oE0OfGPrEntIn1mUCovKJ8XifGCyfcT6x9goB0j/WJxaG MG0mEMbs+MQIVbT8Tj6xG5pF6F77vu4102L6Ly9/a+XP8EwIe/R607RKcJ5QS4E6l0s2R4/gg0Lz 0paCZt0xhelf0Pc9/K5mNZOG84RKVXljXu3KbhZlsV99LPevfnv7uShtQ4biGvTxxO++sL9dlAn2 OIuifdx5kWmQuataO0An9c3iI7RRDcov0riAkttb+C7zS/w36H3qJW3s73u7DRgX+iR7u/PriRZP iAhF15tX05sqrTxxLi5sKWEefgFb9vp9sSv0Yg0gBbaZjuyD4Nlv1SHSyygg9azpbRuyzmgc9+XC Ua2qMKiintti7cl5PcNgzbKocu5epNT5GZFgzWRh54gL1CK8FLW4sC9Ci9RQ0uRn1aFOEoi4kjIu 304SGbpAgNif7xUzbNoXgzwTt3cIlXNb+dTakuDpxbEA4V8pzCuI8XtQ+Va1VQbDkqZV++K4FQNT EydTI1Om5pqeIAdJMC0WZEUuY5qV6jgpraCxn7eil7ZYrNYmLGgD74HruKbIwiVp5Q2GNwtQqIGT wTyDdzm2CXFGjasMkZW3yXU1u/AySmjBwHScrZkxFnJGj3d9EBjaMpk4V67QNAF3hFZLtSptYRYb 8CHBiyk2Xk2WC69+1hQrcHf4vTIeIcB7VyYtOISm2oXeO+etO3O0qGxQQhMHFTLJkjpO/CApJbCd Kh5nL4rcaVfvfRV73qzf/R/WP4IA2zxgjyhjI9CNblAi3Hjb1xDWU2QbmyQTQb3BRwcr6uC3gCa/ dLis1T32v9v0Mua5JVQ2Bxb/tpuwn5QhVtdGthTPtG4Sh5sHpoKjJw7AnsAVEGEoTO8BrvwWfWoo nETJnobNjrZFmMgkEVl3TDAUugNkKRoKaCigodB532gooKGAhsJhGdNUl58bCmWOdgLaCeNmKakM 6rdfvasPr88KnigPipzKqMNttA3QNnhY22Bi+wnP05lfgyNsA3qzQ4RqTGH6F/AQAW0DtA3QNkDb AG0DtA3QNnjWMwRvcgjjTkwOkaHJgSbHNzI5pvaAodZSfdbk6HxDpMcoI5oKbx5EHkdkIutJxOMI NDnQ5ECT4yASTQ40OdDkQJMDTY7WvaLJgSbHV5oczYED/GGYPtYJpBmdeMpBE83lfVMlqjHbpxzV BTzlQPMAzQM0D9A8+CbmAaY1oH3w8PaBKFkmfjtWdWiWvIR3RYdLraOFgBbCU1gIE5OpPTkXnI6J g7qpheDH7FoIcAEtBLQQ0EJACwEtBLQQ0EI4vBK0ENyMFgKSeiT1D0vqpyY+e+oi7lwhqRqzQ+rD BST1SOqR1COpR1L/TUg9RgUhp0dOj5weOf0nnD6MeDZ7IJuasEyyVI8qZtRiw8xZy8vxnL7+fnfM TtXTcD/36fV1mNUzcnrMHkA74fvaCSY7bycEBo52AtoJaCegnYB2AtoJaCd0f7xdWvtxsBM+6X/C gp0Q2qDE2QlEW5Vwq+cI6LlkaNRjJv0x794TmKVKzdX/hKs80Ea78kxxb4u3TdNcK4GdIrKVI1dG wqZdSwHSE8iOCxhOoiiUSFIROpd41Fmp1TosMlj7oIxpFGTDvEKXl2ZebXkSIJZGkgquMqdaAgHB Qh/GAF1xTda4slUzOq/gnRsFxp7EmT7XK3nqV95fCf8rlb0f/+vX9p+/v/i/39Z/vPwL2OzALwz8 ePAD++Cln20Xeul51+AQ0Nbn5Jf/8T8HMTtPZvzfHwvz4tTuzW5eyrfF0rz8Wu9Xr/5/66Wxm4FR FqvF7ii8g36eNtnDyWdwj9T/6aDf5fnXSMSYSNS5PpCXvSTQB5KbAzRdHuKSgGrMVuvlehIXWy9/ fheR6NedRNN6+eIQJ+h3eTI9iBnXenmcuHGtl0fKGtEMcqykoWaQY+9uvFtlnETOaZbxmqgu7J/e 6lmtoaOw/ydYUqF14vDy6opUmgZyvly/1jY3sOtFGf7/oYKbiofGkSZKMvM2WiDr/16vKmveU/RF uQ1uoBZQj30pwz2iR0pKbElJtVAOghJgvlTFCUoTK4Gwmo+qEWo1GR0ng5Smui0Pzz+8UbN78+tk GTp6BVOxNCru7ngWPI6F/1Ruhaq7Zm1tB/uI8riFnOqEg8z31c7fqhflF9zHYhPcUbAyMhF5zy50 6zy54xK2XRu3ykb3svyae000VaBe5Uexsn8ENfCSLJgjLg4rRzSIHClpBOeZylF+7N7AEDTN1t7a p0SeMNLa2tlha//f/47Y2j3x1Cx3rG+2iH5QU5Z0GjtCC/bRQ1AtkzQj3TGtOTGV+B23dsqYltKU M23to5vUjxTHsyQBJW9WBKywALIsDmSJ8qYDKBDQzGKz+Kj9rTRq++Dg4AMxqvxYFMbvnp6DFrAx rfX/VaoEm2daRt6jdKJ7j2TaLea2d4tkzC0O/Ph98bpRwDX+Qk55tkl1SxnrtCE2RRmNVYPK2Ouy OlYZaz1QsqMHYUz5tcoolRWpzZ9DGasHGDbJVVj54eQxHDIkcSSbCVMa2B2bpVVUSx8E5iycWgw9 kr7AkuvzAskEcTzLEnfUSOCaodJ93KZ9G8yBybCsMxkyaS6zgMOP7drtFpvfzSk4UGs7O7W4Ahyc o2PAoX8hzgivt/rDmPZMx+V7gUPEEJPBoUwz4JSHV1gviTzSPC2rs7m2GK4mybnFdFB1x6ruH2pX vpn1a9Dco+oetemounK66nLuThTz831deMY3vJ90h2Cke8Hq/pgdHv/kqls9IdjXw7IqNna33zSu IDNIibruFZ5I0JdmNdRSIjd0L0WzrhTJJ4m5wWQQAuKpPT2h9pLqtgs9uQIC0vtTez/mF1N7kkor kdojtX9WcDhS+1NwyErFWuCQTgcHacZR+5to7mHMr6L21RNUxpBc5TQfcvAgtUfVvYra07PUnrZU N7tCdd3s1N4P0aX20iG1R2qPEBBB7dnJ7p2qpO21z6dDQPYF1D77cmqfUpkgtUdq/7TgcKT2p+CQ myw/ggMlV4BDPo/X/qzmHsb8WmqfJo4nCTEuQWqPqjsntWdnqX3LKqf0CtU1s1N7P0SX2mcGqT1S e4SACGrPD9FxVTCJcZkTrYM7yqZDgKYjqH3dbbDZDK6l9n7MLw/IMc4gtUdq/6zgcKT2/NRrn6Zp CxyuiNbT6f2pvU6/ltrrvEyJzcVgwDVSe1Tdq6g9P0vteUt1r4il0/ns1F7nPWqvc6T2SO0RAgYh wP60q932L7XT7cQx14aA6TF5gkrYScdBQPUrjiYlHaRsvd+nvTHT3pisYz3Mu3v7T/ODmXZvLm0C pU+qfH5jl+oXlDxZb34UIU1TQdqXjkvpEE6Gkic/3yHjy1sKRu2acjw8ctlLl0DOJ0zIbo5sXEHd HBeXLBdZSmT8AwzFpfozJFMmiIATBTj0AuDIFuAk0zmHSOMBJyXRgMN6Y54Czl04BwIOAg4CzueA w0YwnOlRhaSMYzhQ78Ioba4BHBgTAQcBBwHnIQGHjwCc6bGQgtJowFFED54zfG5SUQScaMBJJHWA ElVRE79cD+tsMGQD4QbhZvj85c0uP+zmpQc3QVWUTNqhVdPjLgXl4+Cm+is9wMNYLGg+vTH7cEPu 68FpPnOdv5Ay1IYq1+8fSxtoCA91YCNLliAwIDCcAYZQWtrUvt1+OpVpBWSLKzwtdiQwtJe9y9Tg I6qH8CpCpdAy643ZPqllIiuzW2VsfQ4M3THtbFEb8IT8koAXWFSFR7XahkYJoTBgVLmfuGYOI2U6 Jiks2lrWdql+Vo3pgSHlcXW9EL8Qvz7BL1rj1xEAiMs7UWfiinRQm5/Fr+43eAe//BpTo8DlXKPJ WuKR2PAzEHkvYjNUNW+yHQVP6HMtzKO18NFxzINE1YBCL380cwPAaNQhMooFQRFB8TIosi4oAsyw lHZA8YpEW6uHSR2U+G8t/lKJ4efcDOEsM0nZjbbzY35rUlc9od+gBvV2UZjF9gNCXIryPcS82hDz yqI8QgiICIgIiDUg8rOAWLYB8Yq0Y2sGWaKxtIMtqlR0LEusoIeIrCfx1NuefgFLnMn9xZNEGPC2 l+734sfCW7rB7Q5GLoAhjwz5Q3hAeDgDD1A4+dDdtOcdZ8fOpi8v8orEYzfyMC5SsXrf74Y4O2p7 BCqh2Z3LhY4K1H6AEGcEBwSHS+BQrNa7hfs1AA5XpDa7kaFBNwUH2QMHx5P7GVMIDggOTw8Oaqea tIhzzKE8gAPNpmdGkLhz9ang0Il0JpT3wYG5excaR3BAcHhycKDjwGH62TqhI7MYbgoO6Qk4ZGhW IDggOESBAxsHDldkHNDzB9fzgkN+Ag70TIkEBAcEBwSHy+DAx4HD9ANcQs+fV8wLDqYLDoTqu/cv QnBAcHhucCjU/s8x4DD9MJOwu5xWdMGB0T44KIdmBYIDgsP404qt3SzW5jw4iFQdwSGffpTpn8Ek cBjsVN8dgpPeha4qc57WqUB3AweZJG620C9PhCTE86/sn7tjeBaoM41qpu0FyRTKJur9tnDrTWFV +QYCvbQkNKePi6i1WRoaAYeVdVRjpqGkIydxjb5ngRo/RRn6mvemSCZNEKFmHNR8GLfcb99qqPlb K0yJZ0K0DkaZmAw1fpHpzA5DDY3mIfUQZ0Lvw5g56V04gxQzQU1rVrMFVVGpIKewKC5GhUJZNha1 0GaKNC1dGmTuCuURwuPFdufXcBUBpiHPKY3TKcYpBVpirFP75e4s/MShj8sYhTm+g6SN/X1vt1Dz VlDAbpnHtl2fAyATIjRpvZreVKkuQ655VDSdf9+KQTXd7V6/L3aFXqy9KA25ZzoOczNPjfOwHL2M QhlTfKjXsGIo7DHURrFFLhwNxPNPty3quS3WdmXqGVILzy+NFCl1fkakF5fBqi7jNEUSXopaHNxq AWVGbFGuVz/tBl6IJLDpSBkX6i09QZG12J/vxR+bxc7WT1KBPBNZBUXl3IbX0pEET2+QyfUEGQPz CmKKqsYBSIOQcZqGytAqbsXA1MTJ1MiUqVXR52Z9lATTYkFW5DKmVYnbelJalT+K7WL1urTFYrU2 MMMSIIHruNRt5nJOAmIt6zzP7a9VCdJAOaKWiEjSStW2/h2YvZ8ZCFzvYc2FitqR7yGjxgXi5SdU bHV1n+G1lgIeoI6z8zLGct0slPD8DgJTAws4rhiG0DyFVQKSXjfrvYeAILm+X3jFZWQZVJ3mkJ5R c5+GSdm4LAWhs1DX/CiFcj1JDrOX72/C7T0U+aUJCc/aLdWu2AJ+//0ff/c717bJDRGRGcSPTKZ/ bP9QH4ac2O0Hfnsk05OjDBm0GD1XIP14fzKlB7udlt7EdkJpMrhtdAQ21X4OY5qyRaZTC6Vusvva 7czYhM+Vxxpkg93ud11Vlna7rbe47dtmsfK/A9tJMOPLuAUrLGPB8A5yIMtKA1QIIByewEfK0qEn Q7XOQAyBXC0RuRXN4ykUVnHanhyZNDXErq/ELjoGuyZ3fmHQYzUSuxJt+LCv61Psci3sYh67lM3M nQ8kELsQuxC7ZjtMXaw7AdwH5efeaFWtVHkmJjemYTq5TwA37Y2Zdp2YGcNICzxMRbD5UrC5GBDO O0Rpcisdr/j3CQinvTG7YJPIBGM+EWweC2weHhwuBoR3wWFyKpnXy/sEhNPemH1wIJhniuDwWODw H8dELgaYd8EmuQJs7hNgTntj9sBG5Gj2INgg2HwZ2Pzptsv167HGdEs5s9x0fCyTU90YdOn+tPpW +AZrqykXQtNRncFBQMmlcWk3UEx3AtZ5Qg3T9wOb9pgzBYqFJ1RHlOw2arWFfhd2Zwu1WP4XBFcC 6Aw12ugFqVAmdC1S712xWJv1yrZr/gms+Yco9rAodlJpOqBY2UaxyTl5HlFG1AHqdigdj2L196nT SvfGPKIYkczkkt/PedMe83lQLDzEUxQr/QwhGmfbLDoZhWUIjgiOTw2OJxWnRZpo0gHHyTmJDBq2 D1G8/AqKRzgRjud51pN4BEfGGBya3y8n8UDxymcCR0QxRLFnRrGmTHTjXwLOlWfs2AyJTa8D6xFl RN180muGRMxwLHozBCRgyaSHYrqDYtRwS+7tguc8N3N5xaonVMXegzLvYUlUbdLioCvMEaDLL4Q6 K8OsD7QuJDGFLCbsAosA9pAAZtRO1a62no1q25626bVqPZhMcutng7kbPSM36Y3ZhZtQOf+eAFbP ayibbzINs06YmjPVuWqht+xyuS6b1YHefcScx8Uceg5z2qafvCL6u5wUQXkd5pT9ihOIOYg5iDmP hDns1FDLpGwbaldEbZeTavJfiTnye2MOIgQixF0Rgp86pE3XEroi1LocEWrdUnDGqFVMsbgAJ0ZY b8wvQoge7g25cK9gJVz0WAmkte/sCkkJQs7DQk5o2WnfiibMqdETGFZxkhwgh18RwC2otNl963KG MflBIm1g6QsgZ6YzsBvGVPLSMH1o4Go3m/WmeFMrswyptCmU32MmrliQlyjoJYkkXuBjQcRTqDQ9 q9LiWE2TJ9MjF4XS91dpPyaqNKr0f65Ks7MqXdqDSotk+i4tTTpOpTtPLVcRrYClUcplvTGPKs3P DHEvlZ6rWEV4Qp9z7zyae/MyN3UpubA0Fju7Cae+oWplaGoOFQ2TuGNkhIxvCBn8LGQ40oKM6SzA U+z5IcP2IMNyhAyEDISM2SBD9CAjfJqcygoypsfDSmvOQkbnG8ZdBRk26UOGQchAyEDImA0y5CBk pNNjt6SjQ5Dh6LWQ0TdM/JgIGQgZCBlzQUZyzjARsg0Z08MgpDvPMnq6dRVkuD7LcMgyEDIQMuaD jPQsZKRZCzKucH86Nz9knLAMh5CBkIGQcXPIeDe82KiFaQyT1HU/R8iY3oyYJTwkePw/e9e24ziO ZH8lHxcYYJZXkXrpp3meBeYHBFIks7zlW8t2ddd+/TJCki0pnZaptF3OLPUA01VuO0hK5IkLT0Rc BxnHn7jRrNiBG6XZYEzWlSittPyhub0M/kbuVr4lqAwb7+Drg23Fc+jrxU3S9hRZFlDMmaYx2Ewp sWkM18bCbt9WfmuqKG0TgWyB0qAXkEzCGkGY5JC/vHIn3pSDVdLxvhX3p3YJcN2zwfTIpNnNeHUt XjHEqzpK8p+Xl+qwXi/Wry97s/teA1QNV3liHGXlaEfwJSDMp3NKubLjQDioABNoPt7ohxObaUJo 828yGJO9ZWY8FgiltdndeqviE2qAkMJ+LUvoduXS+Ic3JJXMmDpj6qfCVHIt9GXToS+/wgacAn0q OgcZC6H+dx/6cjeEPkGEfSD0QUqCKrPRvqgz9M3QN0Pfr4A+caX7m0+/yuc2JLm/UIgqcJkSMYOm 3H3oi2P23N/4Dfk497eBJ2wafSfou0tD6tmlnjHwt8RAD01+Ad/+NUAVf7o1kGQyN4HqYEt5iZsA 37Cmi4EmiPHCeN20B11qoQdjDjMrmFYPNf946UtB7xUCzKg0dVPqd1EQTjZLC8ffBVmjSIUy94WJ 8PAD2qLHPQzyGIP+yEKlnSkWcgp1CPfVT0DEN/2zJdxEJDYjZJxSSNV0PpjDcn8Wz9JuYIKqS45h D+3K/3nwOwDuTMPsFE2aHi8DLXl7T9IIKwJMjAVoF85lkmLhIb4VXmOt27TPTVi4E9L5WDG3N1pF 6gFsB5aO2zP+Pxz//w47ewiuCZ4OCiO6vIv/k2v80HhWvblHHlxviJbcdhpT9lxxWpKHRj7rMU12 J/dfZIo0F6dvDE1t0g1NFhTWvd5WG+wWe5IG+ywJELhkArPDm90FmCfxGCdWm2WCuZ4Yki5lBoMU MAA94M6CAc98Bwwms85onjufpVFIOM/jZk8YgkVzkwzGpAOHWIeHtKofrGrsDU01BvEJ/dFUN7Uu mldmX5TYmRk0Oh8r+vN5oEVR5WtMwL3a8lioSkv9jXLwIHbkkHQxM7hcDS6H6EK0dsYbfprRp4z7 jDS5dvTlv/79PwngEt1GUnJD9eDQaTZiaTA9+pxbKGQkY5xksj9mXvaxRIjH9ApqwFNcaS9Nz7i/ cUjL5gAHxjmMZRXRlzn4NqIlRqus9YRljntwY7bRnSy2myVUtmcOnZgkOYp7Bt40+EJ+6UtYHA+A LEKO8f4GkgRHKC5wOk2wDp3HtNL4H/FHl2a3f1lFrDWv/qXyW2/23kVtDS9ulzQLETILAPd/UOcf ymYvdvtFCfRBAV5sloZQShldxyxcsdxsvmM1b1tOcPFU5iXEFNy23oe1DJu2d7xFv7UoQnxTm7+K 1eawBmkM3rtKFVZSUm+g48oyDK2axJXZjMMTX633cW3FelP4v7eLCg9IgI2dhueSWAo+9Y9VFFWt zLLYxk0BahOVQ+LcREQbUA6bKu6Gn7vTWaEW4jgRldKiTTZnFN/AYVs0OgujVjotdkN1BLX64cPq CjwxsEc5CFNZqlK3FFZ5DGx3MQriGolNLpjJMf7ljrF7o2CHmcQKYEpIEjCe1HnwFhCYZr841sVI LrWv57b3K6yrD4xgmJwew8+R//z9fzeHam2W7pzloIhsLQcq2WSKArV5fl2MovsUtR5X6/XPWtp6 1wvBMeXgg36K3oPcktFTO7l5AzwhtPwX6z2UhhOwIWSainc2igGztt0HcAxN3awhsdvMXWr8OUs8 GcyPTJrdE/kSv1WBv0tRUJ3psoMwk+uAAFzdOQoKQ/SioDhmPwoqDH2kb0KggQuPPtKdoqBwExRq a3u1cfUlEChFUK9l2i3GMwc95njqDFMj8Vmp3Qmm+GSaOimDvWt8lgErvbTd+CyOSTswhV2s5vjs p4SqOT77OcHlspel8xO4iOlMICrcw70sGFMOPpi9rNnLegZE+K3Ml+2m2q/M9uhlvce3plmLMKk3 QGBKEBU38vAGaIgwsksEFPEdjMdB+0N0m3DCB8Z1P9AmM4/xsjofZCy/V7px/YTA4NiU3wvsIlXv 1t13MGGQJ0190jWJYFmmYKsutsX2sPtWbKO8xfq1CJVZYXkCTpDWZpNsGcGUs3kttqUwvvp9FPra 7GKdRkl74ruv6MwxvJvZl9v26qu+B2FpHp3iOefN3dfP3fEWDYO3PPE+7hluvx4l8U4tQ+4iVRgm A76X+vjCNQmcMpOhrCT/gwnpGVyQhPgdOF7tlRJD2m3apYZQHq+7wmLpo3Yo1ptGGNVwJJhJzHDI dMDbM3hu8ZRVwA+uvwoTlMiUTZPo6l56PyKYhJ/FIgpGqiwyZZM2tSAqQ1H1Bc7arXav+NQUcqrT 0vZITgwA0xIgGXF57Xd7j+1vWDIwiUxzAruu2G0X6wKF2m//qAvPJHblUdywdpUNlnCGUP4sN0pR JLQDf4f1ao6sV6rkRC5KdDOAn/+WizIS74UpXMlF6TtLxzG7XJT2n18QSLmTJcJ0hB6GOGZxk7Z5 UdBHqr4iR0Wa1hz8mbX8F2OmvCOx5A4skPJQVX69L0IdhIKVYnpIkkoBVgHagMgqaDcG6iY6mo0z CNvFh9YE3ovFYQ07Di1TwKDEuFhmlW0ktVMyaK2ZNH3EMk9I2TKeWlHcCXhO4w3oBuSpkuuhKDJF kObWthYkJrsY96OW+COKLNkEds6NuSZMZESBhsMDuVnXRBGLZyCD95CYgpRpw/J6yYPVWlxtWnrP DbkdPHpWspFU7Yvd4hXiuk1MAUk/5fNQOzpq+EJFH6oynqaH4aR22abvxhrU9NJmlEc7+M19bS+v UUnTq0pBRa5tQl4jDqFI/wNn30r8Ohr+xrpYSMsNHqwfZW0mV/F0mR0gXahvFFLbMN7Qg5dlifw3 mFzlS0APhjc4gv1qlcxlmdXGQlkBfbSAFpbHtSL5k8o0tzHYQFnjsmB4mXgM4Ka9AGbjbqyB1/+N 8K1UQL5mon9xmxBusJ6IzqLIhCXdK+JabcvFxv03aVHwiBpM5hxQ44SCkytckFCGKd20P+Ln4Jg9 WGRCPzziWlfOuRsK3qoqz9wld77wvQAO9GxCTi5PtFqVk8ngYIK7DhyEFIITTjkkz+TjZU0ElwQu fLni0UvjZjBmDxxoVvoZHGZwmMEhFRzYFeDwATaIoRMsh7j7k4YIVA/G7IGDlRl5jP80g8MMDl8J HPgV4DC9aAhh+QRwYCngEIfogwOM2QUH4ryUMzjM4DCDw9XgsA67U+C1f7cq4wHrgsP0IiKqPFPX PIKDyIWLXoGO/xPRk3DxG9FBUOBWSCaCGN0/+OP6u0bQKKA3pnPkRWWZzAQLUbyOPgv9OpHX3yKX 5plDrt5xfGpwhKIAz5D/kobKN4IY72q6UDMVMm0mjwSY1vowHYCZngVjgjoLMG+sjxM+ZGV8fQmn n0kSlO2PCQBz9oOvATC44jp0/9e2JY0JDppUjNdgmRFmRphnMGG6CDO5lR5UDTqLMJ3jz3V5KicI 4zN5xfF/H6JgzD7CwCTkY9vB4JizCfM1ASY+NYfLLAq3+Wt9LOkCOJOn0Vh+U6jKLkPV8D8neFvZ ZagaSu56Pp6/C1Xte4e4wuAnfJjz/2aIvsRuCz8Q0Ld9GCPO+jGoOruKRKgaMYaGQ7wPVW8mMz0U c1FU/WhAlNl9L6o/i5anCHXQ1JCiNSIqAUBH1pcKoGMTSwHQy1NLANCRSU0A0MsSr4O9yzKuhr2x qVwBe6Mz+dWwN1Kw5QLscRVSYQ9+8iHYiwJm2Jthb4a9GfY+BnsjrOb3YY8EmWrt4U8+AnsgYIa9 GfZm2Jth73rYO8c3GKExX4A9bc9be0N8GPzkQ7AXBQDs9b+Ria8EezMszLDwq60hTUYIzBdgwY6F 6SHHq99+Uwr4JAkW4nT7Y/bC9I3ErwMLs90xA8wnAJhqW/4TWwK4c3ZHRjoA0zAN3ktPvRBlEiSk l4kAkEuzO/pZqCRAmYiz37gzwJw+qIv/3wdgEhoLjLlticmol3dqYvLo5bldX/7p8qRSCkyMSLq6 wMT9AO986xNxtvXJyGquajEyIoOUrm54slysvxdbs/8WV7MEf90qcNhLl7Qh07uDjM0vYNmfN7Mr YXf7UQToCRNlJmBu2OMVUoYbho+Bhbq0iaWVmBqRlVpiamyXpxamuCwvC3WL75X5WWy2Hja4RDk0 aa/dCvOSm6mMrM6UMkPlXDn/o10gqmcx6jv3JSkWGvZYfJPbVhR2e+JJOMyzaF/Dtt/tq3W5/VmE arMqDju0QjkiX+IxT6jeMWLLJBTbSDRnVhEZfXXWX2K58h1/SZFp5gxjNA/nOrAd/SUckGvm2m+w +Dai6eBGn/d75kw9Zlv1qnagPFNKjpYPv72/JMes8Nlf+s3tkHyKHXJjfcOtU/Cco1Zd7/CFBWd+ /qOp0peoLG4rzASRYWJLFIRafw81Kik6zWkeLje5hS3g12hiF9+q9tRkUCkrS8N3bpyEV9BIgUZ4 FbyApst5os31GZTFdlF+P2yvURYa/ow6I9n3JWU+oV3nx3zfOOav8n17A75tOvr1Y+5fDOIfJpFo hMWilhWt+3KzWqEwDmc9Ef8ZcVif9SgO5ECh/zdFMy/vM2KUg4ViVa24WvQ5jF1iPUIA2TdVPC7L EyFXkFB6WC/+jggbpa26VXI1NkpJjJGQzHDSztEsNscqh06m+7ezIp4V8cMV8Z+r16PPNlTEWU66 ijibqoiJMOe9tgtakgvJslEHvhmCMYik6t7teRyz47Wx6MfBeh4dhOasJMMqdrcKQuMT+qOu+bqv zHqHCFnYQ4BoCoQTWamSJOJsG4mL/cYCdNMMd2xqYHs2EmYj4VLMFOqg1+qkiQJCiCxU3i8XO1R5 CiKxokyLnd5LbGKX8pFAI9dl3SuhiFAeD0jTM2G9cWA6aLCV6Ju2Qw+1HXg8ca7ps1a4RcUKuA2I f/DrfSbwD2W72dNULNS8xsYC/ZrX2Mr8Tc3ry5IkKwNvJEHsHeEvTg0j7whaJu3F3NAsSa04PrJj XGaPNbkXm3IP6j8DmyRLAvjPYZLsdt/eI94QYwXvmCR6ciBZKXW9SXL8Sa6vN0nw+1YMPniO2MCY lzOdhjzfZ8/2w/krsWiMAiAuvalq3QxtbhT6VWk+d2lR1bWthuomYRqv/MbpsnfU64k9kB6r0J/z 9pZrXcIBt4s9HvDj+YaDmafJup1uiygWWH2hH83c5QaruUu4B35TOO2XKcnF+p/7sN8OmzXX5DFD O+SxNjN7gpIM5ZVKcvjBB25b45hPoSTHV/EcSlIIoj34PAd30mvYVosmaqPfQK9Fo0/XfClXLDeb 7xictjXzKk1S5ktK6qd1FIRODDWJU5ppYDMNbKaBfUEa2OfwtoGFDqbv2cbsRrCjIZFzOdWQYEyc u4m/ZEhoG7fAODKchiCctYW2j2POLPSrLAjqva4b2q2Xpmj0ECAgS4KY2fv/dFbSeRYZS2eRCeI5 PpfojRflZvuzcGZvXiuzOnW0BW1DfaJxC9zExriFJKf6sp7maOCGJB0R7WQdsvayvr696MjErosi je1IFOe2lXgSVV/+py00ud/u2NmZVvH6oTaJoI5xeLtrv0fTcLHeebxvp6huaZq+vfEzvLW4pBbI c9jkftbOwkXz3L25W2B5KbKTtSOmk9TLjHqRzDscDzh0We4nW+Y4pgsdugON1gQXDyOpR+/GGBMi QPyGuf5Ml9mxJbkHbV406pyagKZKIoPu5nGPCU1bf9kRXTnjVu8wknKtToykXNKpabFEOHsdNbj3 k+hPJDCSpLL6rYD+B/4xR7SL9xArus8RvXkc6pnP/NdyBN4j/dyuFBQxwtSXY23cbl/DD0PibVqQ Ji6VgOG4r34W+02XaCxhW/BU2qfODblsLefp1vKz8qqZQKdl5Y5UEsokElyZSg1w3TZGqXMlmqbn bdiNwF5j47SLgZyoKMA+Xn4fRANl2gKjICvfCGI+WdItC7R9irBiWW3W7/GKc83sSYtn07W4DtLb a7V4+/SUCaNX8achgqV9XjGMmenz36iHeMT9ZFz3/bQ4PCE4haebLQzfJZLWnpeZH80KoXSNM2uz 3uyW3sMqBWgilcaoi/NCFdnOqytPIpOTpSlKbnQwHYEA9oe25ExiZR9uvGvN/t5Cy3R2yx2x4tXv 9z/fYkVzBRHoCSv05EI4JBPSmTSnXEpPaGIBvh5WwJjyV19BMKb56N3j5MxxkB13RRX1kNn5aOGs d5top+/8Cq0AvIErk476M18jcCMFki1QiZffDBpzDpm4aZLuQLEzTHLgHMfD1N4GKbCGbdoNR6ZL DB3/CLtWDIY7XaqY4Ekbcmh6GsIDz54m0DCEHRX6/3RgZ+LNJ0KAOgM7p5tPRkVm+rADH3wQdtQM OzPszLDz2WEn+wDs5BetHUZJTnT/CiJ+8EHYyWfYmWFnhp3PDjvqA7BjR2DHMt2HHfjgg7BjZ9iZ YWeGnc8OOxOTOREC3GXYiS+t7MMOfPBB2HEz7MywM8POp4Cd9njCyMYE0oGd/AOwE9JDyuqjsBNm 2JlhZ4adp4SdYemK96wdFo/yS1PBIr2lAz9XTesSdw2I9FdgwvsCYMwedw0n8Tju2mkSYQw8J8PO jeg1t4UabojDtD04gU3eigROaZbI1PlSbLXzaSt8StpKHp8vpq1ERNlBxZ+GoxCSWfmC69LV6ViY ZHTYfSu2fu0W69ciVGaFtSCUQ5gxSXymG5a8uGEWgiA6KNtR1LuaHYgZaHmafmZeWdVkZ51qcuYW 0yzyxLSNp67JeZe8nDswP29ZvyqidqlYA2JLt9iVhfOVhzKFWMRKJTJvb5exkhPRWALHXfJkFoU1 u28ni+I/Ly/VYb2OkIKc3ObT2qAAagzaFdcZFAmmip5sqmh/Zd7v8ScfN1XimH1TBSbxUFMFko+9 lPpu7e2e0lSJNoZjYWBjsJBPsTFma2W2VmZrZbZWZmulI+nTWSs91j9VNqMno4KRl8a2SC4mwiX3 Oq2aeJ3JdKVRQXEI5/RgTN5l/Zt4Nr5SVbJb5nrdtrQPM7lpjIFGfxsGp9qEtGNtmFRtJKVNfMLM 19TSvvfIzi1NG9ccZudaOOHPk537d9jZQ3DnTjjLXa47J7whzU7oiq019+ZMV+w3J3zw96uzc2km KJHd4CKOKd9KfGiEkwl6vxP+xMm0XDKRg8ptdheoWol5PUkODYhBx+gkhqRLgeoaGWjs73WuHgAX lKlVabbTjfpXx8kw3ZsMmTSXu3Swjs9593NdDouQ1mAgjO+AQTYVDKiUzmfXgEFnI/A8bvarLQpB ovlYksGYlHS/oWUWfkHLR3KnLl71E4Kjsty8Ftbt9pXZFyU4ahTzsZOqFT81tCiqmk4GuFfhAgVW SFWSGQJymO7LIeliZnCZfJfatzS07IBLPi1ACVVyJH3sXWo9Zq8zEU7iwQFKnMS97lLnO9BPFVWc I4FfLxJ4Q2/+UwQVowUSaF1s5bA95cxzDI+lZfTftnXPHP5rOvdoUHFHlc3pS6O5k1U28+668N/g 76NOYHe2A5UNY3bDf5QLVX6p8N8cs/u6MbvKLNx+s73CksYS33g6k6/6cxIe2yukHrN/1Q+TeFzR y9M/o936vl7M7ouZ04+SeLsOJDctznc7UbetzHcjqsvN5Ny6AvbtysJxIxUhgzwC6c2UPIJbPfR7 5CPcxB64mVUe/VKBHWTQI62/ArQn7PiSZjM9RVoCnxxKE0qkddSEUBp3HwqlwZi9UBpO4sGhNEGk pr8Z128Oyd03JDcT/b5aeO9zhNFmbt5vGZxjuVP2ZASIOjjH04NzsH3uzs2jdMDNi2P2gnOEGlp+ peDczM2b43xXnHDXMfJfXv7V4ZVxLYQ/hd+p4vV3JsT5QsglGz/hNDnOdzzQvPSloLo/pnDDD+xj zfx6Vnc64Tyj0tRhoVe/9tWiLA7r7fLwGtXbj0XpW/slrSEfz6L2Bf32rkxwJFiSpcZDFKlQ5r5u 5QCd06vFFtum4uEXKo2FcvswQdBxi/8BvU6jpMr/efA7xDjsiyzztE7sdzG0eEYEVkdvX81gqrQO CYY0rlPGIvwCthzsarEv7GIDIAXulE3sexCN37ojZJRRQJmLtpctVrigabfRXARqTc2dKpq5LTbR nm5miA4oS7qljSKlzc+IBAdEo+ZIY3cRXopGHOpFaImKtUd+1B3pJAGalpRptT0kkRjeA7E/VrVl 2LYrBnkuTXcIk3NfBwO6kuDppVkBIr5SmBeKiTqo/Fa3UQZfkKq6XXHajoGpiTdTI1OmFtoeIEdJ MC2GshK3MVW2PE3KGmjkFx3fpS8W642DGZYACdym9UAWIVM12QBeLCChBZMMpolR7tSew5q6UPsh 6+hF23py+C5KaJXAbBrnQDOG5WlOiz4KxC5MLi0cLCzNIIBgzdKsS1+4RQVRH3gvRRVPyXIRT593 xRoCFFFVpgMExO3KrIOG0EO7sIcQonPnTg6VxzPo0pBCZjpruOVHSYqANjU8zV0UebChUX218Vxt VvH/fHwEiNocoUeUaWFy+f/sncuS2zqShl/Fy47o6GjcSc5iNjObjpiY6ZjzAAwQF1tjlaQjqXyO ++kHSJISSbGKBHWpi7M2tstVCZACPvyZSCCphNJIz5tYNvxH0JbgHIMel0nRKy6ZrVrewDvs5hHB 0EwsgKNyAaCIAbo4OE85SbE8MJ2+yeoRK+ltqiDzwhEq2/2Tf7k9rEwGUoVdYjHyvKra647aF6Yh iSsNhR8gqJDgciyvHq7DYn/pclwk6d7O5ajbFHb4DXQ50OVAlwNdDnQ50OV4UcrnRr/uclCboc+B PsfMXhqWi85uZjs2oa46nT7kiO4Bugcfwj1YWHQi3rUVxuAM9yA98/jUxKV7ENrsuwfxG+geoHuA 7gG6B+geoHuAOxLoHXzYHQl0OtDp+CWcjqW1X6hztBp1Ono/IbJz0pJ3zBI52+mIl4Hlkoh8YNEO 2mR5hk4HOh3odPQ+b3Q60OlAp+M0jHFPAr2O9+91SCqlvMyDcpgHhT7H5/E5crpwo4Oqiss5Ry9u uNEBbXY3Oupv4EYH+hzoc6DPgT4H+hy40YEux0d2OaBkWd/lKHJ0OdDl+GguR3sKAjYFjMs6LsfC 096xTAqnD86tgjb7Lkf8Broc6B6ge4DuAboHn8Q9wC0J9A/evX8gcm/i+935TbzNtpk7rL7JFt0D dA8+hXuw9GR2EEFi1mVQ3RKHlc+LydtSe1lQJhd5v03MgkL3AN0DdA/QPfi07gHuHqB3gN4Begfo Hby1d7D0YDbJs2rWvU033DyANrubB/U3cPMAvQP0DtA7QO/gk3gHuHmA7sG7dw8kFfYyuUhgchH6 B+/eP9hZv34+1YsZ+AeW0pN/wImqf4Z9+ct//S3BPyCmoN5qO1T/4o7nGZo2czL4hnpIsZhBr35x /yBKPcfHpR4U2EqTeuhuvDt3g8swt2JVH1BTYdUPn6/e7dY/45BR0R4rUpfBoOebIVPbOz8rj5+K StPxvFBQNPDP9fZrWdkwoPWxNLGIHY3d40kfMec6z2lrLa6qMDUgTO0S+1UVTp0sgcsCq3QNeA1r l0msa+gLTmB6rOtigm0PYYQkdQ+KUkctP1JLsoDSeGnDhFeesXaYxCf22z1wSsa+pRUM5rH+Z9VS JTxhGMRxcIDvw5L0WxCYhtNWdDUrdSDVDgox0liJkaXpTKWpLE7uQDQUFaGKrkCV9mmKKiviK2uW 6QYeNLHWpagEidPpj6r8/ryzGoQIsJySNLrdqj+5sT0rFGoEptvhjg6fiyx4rKD7VVTw3+vyzBH+ 0VPJ0lYmogMZghXzbbW2ZVgzG+eepsXTQmdY3usMWdSXwg36Qub0ZeK/45AeOf8aFSOTUfw0epET qAPAQk9S79wJv5lVlZ8TT+48cJ4TMzlLRw2c29TnS3gINa7Q/mF6saVa1Er30Yv1GxoNChZ0UVDw trLuRrXIhXQQCQT2dsJYImkSCqWcivMnvKl29YvMjB3iaQ8mpCds0CHuFvQoo1CJuYQKjOV2Ez5G kKkk3aEUimRRpLaWIBhSm1NxLU7cAQiaF0YWfHLBzIXYTQ0h3i5MqvVLYVKhMUw6OXStjh+EDz9T tkGzQ3nUXyHAFcMPaYEVYaiPzIh2fjgTROA2LNyNQQofSJoCuHXglfo8b83VtoLVKHdtsna7VVwr dsrXmzmdbv21uaRRJ9piRGb9B4RdIfDR0iwJSfNzYDFaidSXif0RUrURssYMVGVOFDv3i4+Z/XZz 1jv9YsqSVr7VO4LKqHdooZLvGCQqLyozzHSdvu8jfJhz9M6pCWvyQZt5f7tciBF1cge908T/xMwo 31K9o6yhkLWz2rm2ajFEI/I0vN6nALtygqi2dw0dOItTiOu00a9y6WwnxN8Yez8RfiYzAp9E6OFT 3CLZfd16DwQDscHTmK9yU7TR+eZZdXxvNk0oqtw70qCndfreV2y+W8Z9gj0xdwcQlMgeHWSlnVfI ffDvyZfd6W3f16rb5N2iClUmMvJA9py/7sSed1l9/Q5V0+84+vfPm7/t9P54GB39Ju+M/uJLMwmS Rz/LvaZpkQYuw+I5ufK2TXBFLeOG9NsszLDNtxj9k9HgpTtT8IZ64ftm2AJgReJCyTV4N7X7Fnq0 jUucX8FIcznseyQtcsG397AnPhLWhlSmRJ+Q5xVka8SICjjUvz+7Z3C64GnTcjWU5RBVAVmw28JD MpvuaWXcsQYZB7d2BnZOfFQ/QqZ5qpngsHdSQneaDxJIkbgxcXP43N6iKCSt3fv9U/spZvGlucRE E68g0eRf202dirY6HFfmAJuKUWKkhVfuldV+205qJj28unqKtgEN2MRiLi0en4ngHsQ5H0RoCb5h O4ZpFRe+ALM3ccVvuqwHDhFfZ6EBicqj+/MYPwhIWKQ53Ifq01iUCUl860+37wxuVqUq7VHvuLIT WsYYVfgonve/X6zslpxXdsW/NAt82spOZCF98Vhd27TZ07WE2Qp1Leranq7tbaH1R39hVXf0yy/N JEgd/c7xYkTXvnZeO/BIzBiaLxmo2yzOIaamE+KRJzKaTkzKwYWjH/Ih6LhwzHW6cGQ+01E47vZb 4w6HjrU4YNMUx6+g95TgkHcTlU+dsB0fEnbY0xJbgiUJaQy1TwFJCHq/1zFhKYv7BmkROCGchwSS 583qz5gt5/RTUBnmx9Mh7msIFZkmSVLW8weRkCj6Ute828ZCP4TeO/QCma+ueOpLs/ClrnheFnZ+ JOc2K15ss7fiQScet+K141MZMrWvu1jvRduRG8EpKeNwrfeVY1ACcvaScPGuAy83NfYrrMZ3sEhy obopQKXZPj2BMUiYSeP/rTKKBNGZjQ96CIKjDE8La4muYBZAEoRMsyd8kfGBWji4ja3VAsshIzlt wAmiNGT9Qh+7Z4YsHFtJc6duvBDfZ1MxwCSDs5l1lmT4UI7P+9aUTQvECKsgZww+kSboGs0UibP+ ZovxBzg4dFg7t3txUc+q06IuJYdEUJGcGCG1E5mr7lEI5UVVULep3nx7RnqVTcnnpYs61wUcJnEb WOPKb3s4KNCmqKcl4gdrVsbJ01g5BTA9nCMlaZ6UykTWZBVt9GYLgwxcqKiN0xyo0K/KdfrVtSdh 2ygxZY/r3OuOwciw5zavOjFvjWtXK6EwwXsPatJ9uxtNcpr9G2OD//7t5+F/f/8S/vy2/ePLb/F8 Sv8XePaqvYv/nvyKHulL/3dYVeugPiebOOrD94tf/ud/nMwctYn/v1vZL14fv7l9nST/5ef2efM1 /Gu7tm4/0cpqszq+4NJwl3nT0u9LvE+l+ccl/Ub735CIMaF0luzScNtH02gTLxmo2+y4NE0nXndp XnyKRPr1O9EL4o01MU6/0c4sD+JNmksI4k3bmus2zLA0y22Y8XSJIn/SIuc0z9tjA6u44VVttvFS rfDXeEgNzqhMD6/BebyKsv4puhjnWhn4907DGTMez3FRm2Q5KQg3w1ph4XzQi0E4mtGLINybmc0y ndfHi22Tpx3VLJwiJEkfOVPOUFIP65MhFWNnVKcZypSD7GW7q2MGdWeqNBvE2PqxwmLyPbyw47cw qtcxlFzB5RrG6jSDleJxjDxtjqFb5SYO5d1qD6GDOOZykdg/T+IzXvTOuKEnNwcDKZ7cw59VVVRD GH5XbtwfMMFiTk90dHwahee6XtOW5qqpBeqnOdBnW9FwblWKQjHSEQ3sJBou7lp4TTQESVuxwrPJ uxaCD9WnPClmN0GsEzQn/Tbdxf0OhD9QNJAqExW15k6ioX5DMdiw/xkXq05ASda3QKZBiOdKxUne jog4wgCILA2Is06cTi+g8RRNNKPNblXasC5v4FiYK7fV/9VTCU6gmsRnlF70n5Ese8Tpg6wLJuPT 6uteRxXzd3Kp4PPmspN6MjZVlcYuPpmcjNbpORefLJmMzTzQsjMPmjblm05GquNfXPExJmP9AmGR 3MDIh+QruEdApcl3Jiwcaz8NrfJ0gpvCfQlMT72SoUHDq3GDZIG5OafcH8WcOafc3w4O3w9bf1zt f7djcKCWduAgroCD9zQNDlfM3FObzsyYyneAAzwLy40uwmojqinELYaDySD3/PQRtrH8RMfX1BeT d81Aqkm6nVt0B6fu3Kn7hz6ab3b7FWbueeqeZ9N56srlU5dzf7Hr8PrUrfer0ppgpP8NVw3b7On4 u67rCU0sjszN3dGbERPhChKT2tHQWElc0IOVivWtSL7IzA06gwhIl/b05GfDHKIxncV0EaCuQEA2 Q9rn7KbSPrT5ttKe2CwnGUp7lPYfFQ5naU8vpL1hrOrAIVsOB2nnSfvhN66R9tK+rbSXyrOMsyyb CvCgtMepe5W0p6PSvuuV51dMXX93aR+a6Et76VHao7RHBCRIe9aX9mGMyUJ2pX2xHAH5HGl/46h9 /tbS/vTgKO1R2n9IOJylPRvTB+wMB0qugEPxeGmfF28ctS8co1ZxP7UFjNIep+5V0p6NSvvu1KVX TF17d2kfmuhL+9yitEdpjwhIkPb8cs+d0E5CDmXLEVDRx0v70OYbR+0Jo7LAqD1K+48Kh7O0H4FD L1uPXpGtV2WPl/ZV9tYJOYZbkhk59QmhtMepe5W056PSnnem7hW5dFVxd2lfFQNpXxUo7VHaIwIm EeB+uM3x8Pcm6HaxencRsDwnT1AZV9J5CKh/xVMq6KRkG/w+G7SZXZZleNjqHb7a/7jT6s2lU/FO lvqmAOvW+mcs6Lzdfy/hAKiOx76qtCMdwkso6PzjKZ74KnuV23jisJdexdOksUNuf1bjOlZj9GmH 5ZZcUzLrBVZ6pIdkSQcROEnAoS8Ap7NN2MQVFmkOkaUCp+I2TwaOGLR5CZyHaA4EDgIHgfM6cNgM 4CzPKiQmTeHEmzSCh3KVwoltInAQOAicdwkcPgqczHfSmOnyXEhBaTJwZJ4MHGYGbSJwUoGjZF1Y s77QOgzX0zjzCnGDuLl6/+WbW+/c/ssANzBVtFRFBzfL8y4F5fNwU/+RnfAwlwXt16DNIW7IYyM4 7de99l+IgXuczPZpt3YgQ+pbcBOvLEEwIBhGwAB1GGwT2/2t6zmEedU9aymuiLS4mWDoj7HJVOaz buHGGUHzQZtnMPCRJh4FhqlbpxbrkPiGXp+FRfIs5IoxU1ea2Lj9ypTPm936+WuQOT9WcA8fjexh SXkS3DNJ4+BtbB3W+kddATsqpSLpfq8AibrwerX+3vYtAqOdDom7wAhFhOLLUKRjzhmjugPFK86Y uuINoFggFD8fFJFjyLHXOMb64i6SgWVUdMXdFQduXTXKsf5P8BtzrEKOfT6OobhDKD4MinxU3DHf geIVR5CdTRV3UhtN50KRMiZySUQ+aPMy8p59olAYzmic0WMzOt573NzKfRHcZi4/z2h5xblhP3Mv LXEuDPfSyKDN/mkDp9qbu+88o8/fmJVn/Q4ylBEOCIeX4FButseV/zkBhytOJvuZmT03hYMcwMFz 9RgfCOGAcPgUcNBH3Z5qGFMO5zJgNF9+sIGkbYsvhUMvUZnU2+JdODD/mGOJCAeEw6eBA50Hh+Vb 44TOPIRwUzhkF3DI0a1AOCAckuDA5sHhigMDdOYW8U3hUFzAgY7ccIBwQDggHF6GA58Hh+X7roTO 3GK4KRxsHw6EVg+67QThgHD4LHAo9fOfc+CwfP+RsIfsVvThwOgQDtqjW4FwQDjM3604uP1qa8fh ILJz5iktlm9lhnewCA6TJez7TXAy+EZ/KnOeNSd5HgYHqZR3d7pEkQchJGO93I3783jOqIrTmSbV wg6GZBZvPayeD6Xf7kunzbdoMFhTULU+LefL5RnU8YWRdZ7GrIo3MnKSVqf7LqgJXZRQlnzQRbKo g4iauaj5Q+8suUBN83WuxsLE4o1RFosajV3JePoJmvtimNIgxDQHegbb88WnNq3psEcZQlyePxY1 zDrF75UcCrYjavyh1Ma4w6GsDw4fvu1Xm/A7cX4DeUxaHrlwjAErwE7M5axiVqiIaaFUptqq4BbY epxFM8TDscXUUuv3EDfCaU67nSOLuvaOWEMVKeKVFX6tj+XBbWz5j3/+o3zShzZ7ViQeKXj/7KJz 2LX4rmkWqzo9nl2+wy7GYj0XwZFdyC5k1ydhl15tezknp8nPhba6cyiHicVXYbNKPSbnhA7a7MZ/ AsxyhsFhjP8gbN4UNi/msPCeUFp8eXeY+I/JYaGDNvuwUVLhNjXC5n3B5t3D4cUclj4cFme/hnn5 mBwWOmhzCAeCqfEIh/cFh19OibyYE9OHjboCNo/JiaGDNgewEQW6PQgbhM2bweZPf1hvv47caseq vLC9GMvi7FwW6wJOwqZfTYQLUdGEWoSc+kpXgzbPsAmvzyruHgebbpt32kaHNxSGRPgIy+Nebw7x hl13dKVerf8a94MjdKau9u2bhJfYmKyefbna2u3GlSb0sNLm+6EddDLpfhEuKRMjZjsXlgi8sATh +G7hSEfhaLpwXJydHEA1fiK6/xOsX6iVWDaZ7dEacMIwSru3QoHFrLN5JkguH34imvPC3kuJ1W8o DImyW6MVLgNOIyL0MaIrDIQyljhwpd2egBj3u3K4CSqt1gESEYn4oYl4cU+eyFRFekRcnJLNYrnJ KSIWfSLOl4vQW5ar0FkysNjxTX1RUMMfl5LdbfPjyEWkGFLsI1OMj1DM+q7Tu/warECU12/7pD4L M8jnfUXE1eR7fpViVV/XwfM8LCkqdsKwXGtl7kUxeEMNcqIkg3l9rBEmBIyNpPQeZBgy7KMyzOqj biJ3A9/U9Rh2RVJ6Nb5L0PsJGxjW/4l8shLTIPI3sDg4/wLXfT7SN236Ze6lxJwXtkFOHKXVagvF sdbrrWlHB24WIHPeL3PoGHO63p+8IpncLErIvI45ZqQ21mOYM3iqKa2BzEHm/JrMYSO+Wp85VySB z603fFPmdGPwhDormcoeeoAFdQ4yB5nzCnP4GHN6vtUVueBmPBe89xO39q1MLxecUOMKV6FvhcxB 5rwxcw7msCrdt7JNxGoncGxWc6JOzOFXpJgLKl3+2MuOoE1+skhbLr2Bb3WnmPQNsz65sayKQ74e Dfv9dl9+0xu7hsO+WbzThNmk8RotCvqSRZJu8H0h4kNMaTo6pcX5iiKuludWCl09fkrrCqc0Tulf eEqz0Slt3GlKC7V8lZY2mzele29tfp1IaEJrnw/aPE9prBM5ey4WlgvIM4ShsTq6PeQIwq1mUNxR BaMqLa8GkfEJkcFHkeFJBxnLVUCQ2PdHhhsgw3FEBiIDkXE3ZIgBMuCrPfVZI2N5lq10dhQZvZ+w /ipkODVEhkVkIDIQGXdDhpxERrY8HUx6OoUMT69FxtAxCW0iMhAZiIx7IUONOSZCdpGxPLNC+nGV MZhbVyHDD1WGR5WByEBk3A8Z2SgysryDjCvCn97fHxkXKsMjMhAZiIybI+PJ8nKvV7Z1TDLf/zoj Y3mFN6a4vczffgkZp1+xk6fUBm5Uc0PNuc1umXkmK1nxh54YZvFf5G4XzPhM2bgJCh9fHFa84GFE cJ00PIVSHswczDdnn9euPK6e3Pb5GAxSF+zp1Loluoqjfbd3O70P1rYBZCuwFuu0yCTWCMIkj6ei n+w5b8rGp6QkMQnrHqldIrruatA9sqh3yKu5vGLAq0HSRkxy4I42l2ABr4rlgRSeFfN51c5GWpjJ zANGKpWH3jZ/kkGb7DKB4oG8ivmzmVF3qysFb6jhFY3DyhgW8zXS0gRvmPuB6EP0fSj00R76XpZq xfK0d55V0+gb3AU4C328QR8fQ1/1xugDeVgh+hB9iL73iT4yF31qOfqKGV7qEvRlzArFvK//7KOv sEP0CSKqx6s+huhD9CH63iP6xMwAXbE82YhXPilAR4TVWUanA3SdJnIh+ugLbfYCdOEn5OMCdA2e FGNmaoYuRR/YDkPiq9u4/cqUz5vd+vnruaQyhatJp8CLQT9kIDIwrPiHnxsT+fafA6q4876mJIuD fjT3lZHTDKQXUJzZROiscUZ0L3qGNrtQjN+o3GODfnWv7rVJoajUBnYgX6RgnNksbcPwLmQNJjOw eSx1wEOAxeEYxnC0x1gVzzNnaXOK+YLGuwuP+5+RiNatAbP7Mo7kuPka90oTC7oyTmk8TG6d18/r 4yjP0vaIfVZfoR07FWbZ78/uEMGt8ti7jCZ1jxtPDW93chtjpY8dY97EdUomLSzch0+F16y12/a9 iSruWufF1OXkF6uKzAfY9iyd28j/h/P/T3+onr2t0T7c9MmJ7vB/8cVmNMxVp+9xUrfbhCY93Mc2 Zc8VD5r64Zs+1KjsTu6/UBlpUjsuhGau04Um8xlUQNntt1Bx+2wtjrMkIHDJBNxf0YyuyDwJ0zjx lm0mmO2ZIelWEAYpMIjrgB2FQdZJpZdkcV4sLQrrVOIOMC/CYE9oggW5SQZt0oFDnPuH3Dg2eKqp T2j57dDhDf17U62jskFe6WNpoLp9XNH51K1DHwctGc1czQQYq22mHc3SLicIdmAiduyQdDMIl9lw eQ4uRKszLjJodX5WGoo0p4Hpl7/89/8kwIUxRgzX9OLiMDahNFg++Z5bFDKiGCdK9tssTJ8lQjym 3loDTzFTLy2/E+TGIa2qiDjQ1kIsq/x/9q5tyXHcyP5KPW6EY724E3zxkx/3yT/AAAigWtu6WZee aX/9IpOkRKpULYFFaSQ17AjHWKNKQCRw8pxEIjNqmb3vIlriYpm3gTHluAcZs45yslqv5lAOnzkU MUl2Cu4ZqGnQQn7ua/hxPACyCHkpM/nEkuAIxRVOpw3WoXhMK6f/FT06N9vd2yJirXn3bxu/9mbn XfTW8OK2SbMQQVkAuP9AbwBoAzXb7mY1JDgLULEqDaGKwugmZuGq+Wr1HbtT2XqExCuUlxBTcOtm HTY2kjogMOUt6taqCvFNrf6oFqv9EqwxeO9FqrGakmYBHX6ZwtCqSfxlVnF44ovlLv62armq/J/r 2QY3SICFnYbnklgKmvrHIpraLMy8WsdFAW4TnUPi3EREG3AOq01cDT+3x71CLcRxIiqlRZtsySi+ gf26an0WRq10WuyG6ghqzcOHX1fhjoE1ysFYoVKduqXwKw+B7T5GQVwjsTEGMyXGv9whdm8KWGEm sUZhISQJGE/qPXgLCEwvN0+5bayLkVJq38xt5xfYJw7uLMDk9CX8vPCvv//far9Zmrk7xxwKIjvm QCUbnaJAbVleF6PoP0WtL7v15s+6izV9FYJjypMPhpeI7yRLLu7a0c0I4Qkh858td1C8UsCCkGku 3tloBmhttw5gG5qm+WBih5qbVCF1lnhyMj8yanYPpCV+qxKk10ZBI8KMrlQEcJWOMMTZi4txMESH H8cxh1FQYeg9tQmUYIAo6K2SoJonBCdBcWVWm39H7oXNSTlIgCKJuD901CMHVDNO/TpA65ztMSE+ Ok+d1MHeNEDLIh92tmsxehyT9nCKRlVPcoD2KaEqB2ifE1x+LbN0eQQXMT4ViAp3d5kFY8qTD7LM yjLrERDht6Iv69VmtzDrg8z6LOGaqg5hUo+AgEqQIi7k0yOgU4SRRW+zivgOLgdCh0OUeviBcf0P tFHmPjKr94Fi5a0qIjRPCAhHlFcV9pRpVuv2O1AYTJSmPk1uMaUKWKqzdbXeb79V62hvtnyvwsYs sIIKJ5jXZpO4jGCFs2VjtsthfPe7aPS9XcU6LSftgQ+/ophjeDizq9fd2VdzEMLSFF3BS87bw6+f 28MxGkZveeKB3CMcf93L4o26Gt3EqjBMBnwvzfaFcxLYZUahrST9wYT0DE5IQvwObK/uTIlh3m3a qYYoPJ53hdncR+9QLVetMaphSzCTeMVB6YDHZ/Dc4i7bQIJw81WYoMRU2TSLrums9SOCSfhZzaJh zJXFVNmkRS1IodBUc4KzdIvtOz61ApOq0+7tkZIYACaIeFWIy0u/3Xns0MWSgUkozQmsumq7ni0x jFbZb39ramMlNg4ruGHdr2yxhDOE8kc5Uoomow7vAr7/envb7JfL6HwwjNh+CkSkkIm5KN97Zj9G kllxtKz4WIpDS+usv0hxvpJPyy3jhSInY/Y5T3TOhwsWzft68s434B5IebxTgJCLp/hcArHnKmn1 TswbbkDA4GqZNB+vllX+z3q+385+HFiJTGNMwgUOmma/nP3ZWAyrTaR3WHOuBNdl0rzNbS6bCS+w LiBO071H1tmD5Mg6UXwVib7HcHJ4SbP4PDezHTxGB4/Rp/lFpQUuoPZ2xtEW+FhKk67g3IhTMF17 duCNEfpgxZQCdkuZyrKnuoR+o5+qtK1hs/wI28ObsBAcSdx0qiSidWoHOwJWh3qYfAvw/X0f9qlM L8YXLqQ8UsgPHqpfUT36XjOo6EBFGd9BQho4DFGQ4QfOfrT4Oj5saqcjLTeYcPSjbhjmxs+92cKa DU0wPrXJ6oTiV9Y15o7B5Da+/vE3pNBgh/3VSpPLWjUauN5A6mUFDWoPvxUTJ6lM84HBBspato+R WeIx9pn2ApiNq7EBbP8nKvOiCJjrmEjNp4l+BuuJ6P0oMuIn3SpYuVnXs1UkAR0KHlCDyZIDahxR cHR1CBLqcLYV9peZ/AcDwzEHsMiEvnuwsnH4N0PB6chE7oGdz0o/BQd69jJLKY+JGEVJRoODCe46 cBBSCE445XDxpLxcEkRwSeCslBecx/81J2MOwIGq2mdwyOCQwSEVHNgV4PCFRApDRzCHuPqThgiD miow5gAcrFTkPvopg0MGh1cCB34FOIwvuEFYOQIcWAo4xCGG4ABj9sGBOC9lBocMDhkcrgaHZdge A6/D00MZN1gfHMYX4CjqMzXBIziIUrioCnT8r4hKwsVvRIFQgKyQTARxcf3gHzffNYJGA4MxnSNv hVJSCRaieR01C32dyCtUJAtNyG+xck0xMjghgByBOi3d4HFzux855Oodx6cGWyga8AxTR9JQeSKI 8a7JtGmnQsbN5J4A07EP0wOY8RdITCjOAswH9nHEB1XH15ew+5kkobDDMQFgzn7wGgCDv7gJ3f+x 7vKtBAdPKi7XL8kIkxHmEShMH2FGN8qEijtnEaa3/bmuj6X4YHwmr9j+n0MUjDlEGJiEvG8rFRwz U5jXBJj41Bz+zKpyqz+Wh3IogDNl2i2QDFUTqy01Xm15/ilUde8d4gonf8IvpncNLfYbdIKBIfdp bsm/DhmaMhQzXQGBDKCZob0W7I0udkJ5EVJhD/7kS7AXDWTYy7CXYS/D3tdgb3RWMwkyle3hn3wF 9sBAhr0Mexn2MuylR/z7sDc+jVnb82zvFB9O/uRLsBcNAOwNv6HEK8FehoUMC381G9JkfAKzvRSm hztew9aVUsAnSbAQpzsccxCmby2+Dixk3pEB5gkAZrOu/47l9N053qFID2DaTIMRhRAECWfafVzK c6RfuT4FY0K7j7PfuDHAHD9oCuffBmCmKsr/m1ROev22IWJE25Ap2nMUpHZNs5D5bPm9Wpvdt/hr 5qDXbQGCvXZpx6ITd9YoSMCKOR9mV8PqTqz2IGolYG7YHxWuDLcZPgZ+qEub2ONWZ5q6gQhToWmP vTA/q9XawwKXaIcmrbUHbUSiTC0VOueN8z+6H4juWaTV/1AFC232WHyT684UdkriaY1jVeTXsOy3 u82yXv+swma1qPZbZKEckS9xm0/W1eSGNTEWERn95qxe4p70epDogoyjM4zRMpzrXnbQSzgg18x1 32DxbUTq4C4+78/oTDPmRzpDw/310sUiQ1kv/eY8pBzDQyb2N9y6Ap5z9KrLLb6w4MzPv7UF7hKd xbTGTBAKL7ZEQ+j1d1DekaJoTrz7Y0oLS8AvkWJX3zbdrlFQ2ymtEFq05iS8gtYKNJHbwAtoO4Qn cq5ncBbrWf19v77GWWj4Z/QZydqX1OWIVpdf075xzL9K+w4GZPqS2nu9mPuLQfzdLBKNsFh1BR2r erVYoDEOez0R/xlxWNq06teHhDJwIrFKrSkc/FCsqhV/LWoOY+egrDiA7OUqHgN7IpQFXCjFYoaR mftBNcO4X/4nuZLztNUMsyPOjvjujvjfi/eDZjt1xLUu+45YjXXERJjzqu0XXpILydRFAd8OwRhE UvXg9DyO2fWcbjoqUSgsee8gNGc1uVTFbmwQGp/QP7D9T7XbmOUWEbKy+wDRFAgnsjqpOGsz29bi bLeyAN1U4YpNDWxnkvBaJGHaJteTVtpXXNeNsSpCY1xwrdHlyoEr1sA9aGIHHMWVgfXb2HvfrP5A /4RxxZDY0HvihtBxM7i2e1jlZhtWQaA+/oNf7pTAf6i7dZjm/TgnDMvl4/bfL9vEP+zQrVObMdaB t5YgLI7IFKeGQXHEE5NYk2I6xsBq7mBu9X4Dz6wKDZwACsDruOwRhgvFKQtosp6tfTVb1TvwzAro gkrC3udgC9vtt89yYoixgvfYgh4d4y2K4nq2cPiTUl/PFvD7Vpx88OKyPR81Z9f+yWlV5IkAiHNv Ns2ZHDRvKVDypMnh2qKr69x603lBo9e8nMn6LHxjaof+mAerXOsaNrid7XCDH/Y3bMwysdvNZL4t olhgzVn7uqrnKyy0LuGI9mJNs7s5ydny77uwW5+2IG7yugzt5XV1l6ZHOMlQX+kkTz/4wkFoHPMh nOTlX/EYTlIIoj0o9L07+jVsFkUTvdFv4Nci6dNNKpOr5qvVd4wb2yYpKrEPia8paZ7WwRCKGGoS p5QztHKGVs7QesEMredQ25AgDtT3bLtxI9iBSJRcjiUSjIlzh+S/IhLaxiVwGRmOQxDOuhrYhzFz gvhVDIJ6r0u0s5ybqvVDgIAsCWKy+n86lnQ+wYulJ3gJ4jk+l6jGq3q1/lk5szPYzO/QpxW8DfWJ 5BbSBltyC/ePmnN0WiLBDUk+IvJkHVR3jt5kH/RscuzyntiRtuDcdhaPpppz+dS+klN2kb1RMeqJ OYmgjnF4u0u/Q2o4W249HoVTdLc0zd9O3Yn3cRv75rDJV9jOzEV67j6cLXCv1LEsZinG54/XinqR nBJ4OeDQT0A/cpnDmC70MhFoZBNc6LuHTWi49IZeMCVQ1+rQaNuDN69ad05NQKqSmNw2fSPEuDll O8HNrtrO3uFnL8wWRLgEe/XDtD1dOOMWnyQLedJGM3GLSjr2xioRzl6XtTv4k6gnEpKFZGH1RwPD D/x9tmgf7yFWdJstOnkc6pH3/GsJgU8sTliliRhhmsOxLm63a+CHYU5sWpBm2h7xwJaxBfkv2HKZ zpYfNeWZCRQtC3dIJaFMYu4pK1IDXNPGKHVZiLLZU13YjcBaY5fTLk7slBqotp1/P4kGyrQfGA1Z +cEQ88mWpqyd9hRhxXqzWn6W8usJo0cvrsZ7cR2kt9d68e7pFSZcPIo/DEGpcnJYMCuOqfT5bzRD 3INo68Je5CKjvTg8IdiFx5MtDN8lJq09btJ8pBWi0A3OLM1ytZ17D79SgCcq0jLq4rzQRXbz6tuT mBTK0hwlNzqYnkEA+31XDSax6A433nW0f/BD6/Tslhtixbvf7X5+xIr2CCL0sEKPrlFDlJDOpIly KT2hibXxBlgBY8q/+giCMc0vnj2OvtQNtuOq2EQ/ZLY+MpzldhV5+tYvkAXgCVydtNUf+RiBGykw 2QKdeP3NIJlzmImbZukGKXaGSQ45x3EzdadBBbBhm3bCoXSNoeMfYduZwXCnSzUTPOlCDm27QXjg 6mECDaewU4Thf3qwM/LkEyGgOAM7x5NPRoUyQ9iBD74IO0WGnQw7GXaeHXbUF2Cn/CXbYZSURA+P IOIHX4SdMsNOhp0MO88OO8UXYMdegB3L9BB24IMvwo7NsJNhJ8POs8POyMucCAHu17ATX1o9hB34 4Iuw4zLsZNjJsPMUsNNtTxjZmEB6sFN+AXZCeki5+CrshAw7GXYy7Dwk7PyidAX3QtgOdljcym9t BYv0bgv8XKGrX+WuQSL9l7quw5iD3DWcxP1y146TCDc69Z4YIrghDq/bwc5p75tIyAVViRk2L5Vl dv66CR9z3aSMzxevm0Qk2EKlnja3ICRn0wuua9dco8LLQfvtt2rtl262fK/CxiywhkPhEB5MUh7S hKUqJrw9IIgOhe052G2T1Yc3x8o0v8p8YYv2VtWxzGVp8XpEmXjd4qHLXN7kPs0NMjanrDsV0bYu WAticzfb1pXzGw+V/7D4VJGYMTvdTZOSiNaDH1bJgzEBa7bf+kzgs7hHZAKQ04KEIPlaLZVX5LSw 8sM924uIPxyC6pMPZJ8JGObt/QRI00zOCi70zRq7TZq3m7VH1h53QZxT7fELxNGjtYf2V17kP/zJ 17VHHHOoPWASd9UeUE3AS3lDxJkmtT9rmKxhsobJGiZrmCOIZQ0zkYb519vbZr9cRkjB+4Dtp0go GHlrecV1hOLPsLX74A5UpR8mdST4nuU283ZE12utuTdnul5/YBon///aK75w3RXK/5OTMeVHi/cV RwIrBNyKqvwGXdy4ZKIEzGjXKWCFxGtGSXQMzCCtO5oh6Vag2IcCyPneXB0EmICquUUa+E/U6TpO hunBZMioudyk13V8ztufy/q0JmoDK473Tl+YGgsrEP7w6hpY6S0EXsbFfn3MhRf1oM0IjkmHB7AQ o/gLaqJeekNjS5nhE4KtMl+9V9Ztdxuzq2pgmhSvh6cVT546gvPASFXQou3TgEsfJBo8MFok8Wmw w/TQDkk3k7HqBtEaVo6L1kANIEnve1LcjDloiYSTuHO0Bidxq5PiHK35LaM1OcKSIyw5wpIjLDc9 Je4pFqLBax5YAKdvLRlIZgHMO69HdHG46M36sz1hATAm75/ZcFHUr9TFYeL65cyUpvW6raM0DLai CWl70TBZdCSgq+6E5f1S+5fdogRhbboD1NMShBa25eOUINyYmdut1ue2Jfdx6N62lG/t7kw+Si1J uPtRahxzeJQKk7h3GidM4nZpnA+r1V+MWd/L4nQtWyatZjidqWlLGT5m0cHJ25kZWRBykrgkvRmT uDTRL7xJAtQkvvwp6gVeHw/jo+NhohDn+5m136CMeG1JPWU8DMYcxMNwEn9BYW52cV/keNjre+2c vZRjazm2Ns1eyrG1Z42tgYhXR0YhmtgaT4+twfK5LrY29Jf24lPqZotDOKdPxhzE1gg1tL5rbK35 WN2QUUwlr3KY7nXDdK6nGN7e/jnc4YwfijwwWvDmOyPCdCGUkt2iB/JhQ/Pa14Lq4ZjCnX5g73uG 3szqRtFzrqg0TVTn3S/9ZlZX++V6vn+P7u3HrPYdGUprQMgDqwNwBOjkim0moKv7ZrbGlq64UUWR lkMyvaoPOi7Hf0Af1mhp4/+991vEI+zZLMu0RKebkCKuiMDK7d2rOZkqbaJvIS1TSbEIlYADe7uY 7So7W3V5ozaxJ0Nkqk23ymijgmtwXZ9dvAFH03gqF4Fa02Q+Ve3cZqtIpNsZovJkSTXho0lpyzMm QXloRPm03CzCa9GaQx8G7VqxLsqPplueJJBkJWXa3T9JJHakALM/Fg2L61opgz2XhvPClNw3UYC+ JXh6aR5bxFcK80Iz0V/U35oWzyACadG0Uk5bMTA18WFqZMzUQtef5GAJpsXQVuIypro2x0lZA00G o+Kd+2q2XDlc0A7eA7dpDZpFUEUTAYY3C1BogT/BPDGinNoQWVMXGtGwjPrZNrPDl1FDHwdm03Sh Zgzvrx5/9cEgtohyaeFbYamC0IE1c7OsfeVmG4j3wIupNnGbzGdx+3lXLSE0Ef1aOkJAxK5WPTiE Bt+V3YcQlZg7qh+Pm9ClQYVUWrWZ5gdLBQHXZ3iathNlwAa3XRfzKmxWi/g/Pj4ChG2O2CPq1Bx2 ZzuUwB/ejwvgekpsqaO0wO0N8TRYUYcYAzQcppdrY93D/03TV5mXnlDZHVL8x2/Qn9SYnusT25tr a7tLzN0DMxiUSQOwJ5Dtp6QeR8R7LVwL4XWP1I/vR26ii/5I6tnwG6I4UnBZOk3qixeA+7dwtCRC n1h0J2MyXWRSf5E4Rlbfg98PNgHWWBJuZKGQhUIWClkoPIlQKGz9a6FQl1knZJ2QCX0m9M9N6Ec2 noDyPHENniP0JzF5OlmUvhlTuNMPcpQ+E/pM6DOhz4Q+E/oc+c+M/mkj/0xCvfVodr9cmHXECYPr vawx7SwpYzVrjqw5HjYzaGz7F+o9tddojgkzg3DMfmZQ80HWHFkfZH2Q9UHWBy+iD3LAP8uDx5cH koVTecBLneVBlgevIw80HXkkQZXl8rqLA+2fEFFrbqW6Vh4cvj8cc3BxAI9Y7tqcJcuDkOVBlgdZ HmR5MDCU5UGWB7+XPBDuozwQWR5kefB08gBHPJuxpEfeK45UXXB654wlHLOfsdR8kE8PcsZSlhw3 kxy1PCs5SvowksPp85IDyXyWHFly5IylrDmeQXOImmnxj2NZtW7JS3hX9HIHpqw6sup4CtUx9uJz 5EHiztWMmjEHqgM/yKojq46sOvJBRz7oyKojq46sOp5XdUgqpfxw0oFVdfNJR9Ycr6I5xt7NJrqw 54stfZoIxT1TVFzGoNPvD8fMiVBZc2TNkTXH0WTWHFlzZM2RNcdLaA5VftAcjmXNkTXHo2uOtQvz /aE1y8nlC86Kg+bgbY+WN/b2X//73wmag9QlDc6401MMccNzjnZMTU4+UFlzvII+uIkuyprjS5qD y7jBIHcJKVV0/ZHPm/V6/hPWjQJ7rEz1hZHUt76wsXf8rRzeysWOR0N7vFTY8e/P+eq9si6uarOr auhAR2F6POkVc260pp01cK24PzDV3yfOy5ZeHSyhbkFX3aC8QQdWJz46YoPvPTrIwYCnhn3o0rQf t4Gx7rXCDMNqg5tMwmpL64fLodmm7VRf/I1x0cG0ULCwxFL4qua0Y0qte41Kco1dDym0PWRp5FAZ KssDhwdDQOMU8Heb1hJS2KKER9b61naz08TGksIKAsv/D1t936+dQfaAAExJGhpNNR9du4EVij30 0u1wT09/FxnxsyJZV+AFvu++taxRgLwo0joZEhN3crRSf5vNXRUdXavIaZqTi5NhejAZMmoupT+Z C7lmLhf+db1ZLY+B5UHzvShe7KGdr6ASklloqZIL8BClS1ufVtS/nMwSte6lIdBON4Q7du9tx9TD 3BUhzlCyG5C8lsSKK6nq2OZ7ytUUdss6MrCuyx16U52mA2/TsFN5QVQ3u3bdcwZehpu0pa+09K6n U1tjjyNTmSwIvok4wwXo/PX7KgTMZ/x/9s5muXEcScCv4uNGTGwM/giQl73sXOayMzH9AAyAAGyN ZUlFSdVV8/SLBEmJpOQiQYm07IIv1eVWJUAK+SH/kPBN2GgYBXlaZI2LWT+rhPemw0IPPLUG1R5m A8HHcjDb134OsAecTY+gQPZIxoUed/Fn7++DL7s120pAd0zabjisBBPLJLUC3eSp7HnI2zpnuGVz xtVfHjf/vZPlYX919Rdpa/VnT7USBK9+klqJx6Z0q2d2HgIf3nmbISjHmtACdcfMiv6YH7H68UzX 3lZvqON+1svWAzbwalhOpc9kVKma6vbz0tiVX2km9X574AXo3PrA7t4J1Me18fdab4+HJh8XmP+h qfIpB8gVwnPmztM++liKf9qwhAPX1NDGLNht/UMSSFykYcgQ1JAaGacLsqkF64clYZFYwaj3/XM/ nfqL9KQI27ZnuKH87hJZluAqlVe+Nd+igJdmArMllvtsyX+2myqfutofVsXeB8XAxAhzkOY6ZXLf SUqSWP/qKhVtkpc+CENMmH8qmHMPQOedEZp3LnnHCjY+B7OPyJPcd1t3HEK2SqV6EuUH8+MAX4TP uuPUF97ZMBYJliBbG7Wnd+ZL+DAPe9QZd3aEc8jMuq/iWH672Nk1Ou/snD7VG3zYzo6SLLHZsnZt PWbHrkVEq2jXRru2Y9eCk37dq6MmxaS1+pOnWglCV78xNLti177fsw0TxyM2Ymm+J6AaMzuHmOpJ sCWPR9WTGDQHJ65+57cihq8bjqkMNxyJFRIMx125Lcx+35IGCzbM4vgd7D3OqM8bgeVTVR3BQ/qI 86Ar05eU+LB+5VP4oLwsSwkJNwFZi7AIHGPG+oTKcbP6ASlfI9+clVF8f9s/Axs5MC1BYaU7XJMC olzF2siygi3kyYSP9n9BYzSaj6G7532jqp/Cctx3QqJ/PAnb/Wntnfyp3kJD906bZHp8TOg+eyeM 2dk7/SSW2zub9ckLNJSVnmw5gmzghnNvcliuVTUqhDd8NjwIFw8dwrmrsN9hX59BIkpZ5Vl7WY7a xfbtzQujsDzC+O/EaaLb4vw2B99B2KpFUmh40L0zXXL3tH4vkaoqMAG7IwmTx2wmaM/u2JuNruwO kvranLAFxxCXFDVzbJfQal/FGWZ63Hkjnic96WAi/FGFqv7AfSmHY9mI0oFh17vtop+wANaPeOXQ XYKwmFgAqwRTODVXCmC7n6Dn3ZgpxKEAaGyGhhCWFilLexKzi2oLEQtgh6TGQ3exADYeuhuVIY2H 7r7iobtZOpoTYjNvk2mz9o5O2VRNe6UNcyge/bwdFfC1gqTncnvc1CfgKtcug++kCLPIHq2Cl5j3 n2/C48VK3tGxo7Uxu3fzLvURLWetOiMGso4EseBKXgn/YdQcV2m+G3yqxuQfXk+Upiob+oamxo6o zLzxYjY+lJK/lJ6CzRmDsJMUTppOQAdrKaeMu/Wnd1FY6J8LJtJq+9jIzdYvMh/zh4UfZGLAvJRp zastL/F1ToFbm9NpK1sCwYw8NupEwsBFpakCbs4d7TxoEZ5CmNEl/ePn/l/fntyfL9s/n/4Am+o2 eYM/kPh47//tV2rtdv/BIQ5y/3rxj//5vycxB7dZuD93K/1k5eHFlBUbn366/ePZ/W271qYcGGW1 WR3akfMO/TJ7qrl4gs449V8CG+QQxqUIjpxTfUPkvBqzFTmvJ7Fo1rmeRMw6f8noNKU4TWntU62g Qktttvlhm7v/BAvcmybDy6srUirs62PPxxYhvLEq/N930gdLKBzEw/pjc73am4Xv5nqxwFNyvfOI FUKmlfel8/V2++q3uiqugULP5hQYVcv6JIhDPAzLMEGCmwRiOXpXpaaqyYSZKwIVunost5m8uhd2 eHGreg21D8q7WIUOiywLxSmskbfNwU0r38BS3q1KH26BNZeG2esCWQTPeDG7wnx8wuDOz8oVlr5u ZJdvzJ9ewaAIHQIYX7FQs/bjdGM0nEdNWMZJ22ggJ6MhKMDvTFpFMksGO1ykvJ2odpRHg37yaQik DcMp6o5pLrpqILpkqZrPMuhiJqOhekOQ0yp/wmbVylv6Vjg01GdKORdnzx5ce+GBGHgS8D6BBgpB LBAji90q125fdtZtDlveVv27UiUfeAgNvCeWdZ8RTXvEWeIXb6vnUoIV81d0acErylvKWF/mFZpt 84qhjRzTbmaKMtZ6IJOWHtRjJh+qjFja1GYm+xzKWL1Av0lu/Mr3mTUfROdh5jth2kc1T0srPwXu cEZ8VD7s+DBhBVXXBaIJ4mJwc/ROvd/aw6r8pq/BAWvcggO7AQ7W4jA43KC5pzFNMUKV5wpukrSQ mdttmBpC3GQ4FMIfljx9hfWSCGv5A2J87qktxtdGh8u5x3Si6o5V3T/loXjR22evuWfVPWvTWXWT 6apLqb3IOvxadauyqLAhCOr+wqj+mB07foGa1lFDTI7M3a9wjFHu69+b1VBLCdzQnRRFulISOknM HSYTERBu2uOTn+11CEPVdNFGAL8BAWKEaZ+Su5r2bsyPNe2RFikS0bSPpv1nhcPZtMcXpn1BiGrB YWKVLShqoseZ9v1f3GLaJ/pjTfuEWyIoEWIowBNN+6i6N5n2+Kpp3/bK0xtU185u2rshuqZ9YqNp H037iIAA0550TXu3xpIsaZv22XQEpGNM+ztH7dOPNu1PDx5N+2jaf0o4nE17cs0+IGc4YHQDHLLl Tfs0++CofWYI1pzaoRRwNO2j6t5k2pOrpn1bdfENqqtnN+3dEF3TPtXRtI+mfURAgGlPL3PuCLcK cuqmbpMQoPDypr0b84Oj9ojgJBbkRNP+08LhbNpfgYMSaQsON1TrKbG8aa/ERxfkFFQjUSRznTaM pv3vrbon055eNe1pS3VvqKVT2eymvcp6pr3KomkfTfuIgEEEmO9mc9j/tQ669XfvtL17T6/JYziB nXQcAqp/Yp22Dd/o1Pv3uDemuLxHbLHd2/00/2Om3ZsmhkPrv6pTgDZr+ROu0d6Wr7k/ACrh2JcK PA+YYAtHQKvGNW65ntaZDWwNmVgOgmA6pjzb4hK6wgQ2PZ/nqi73+nzrpP4M0ZQJRtwE4Qa/g5uk hRs+3eJgIhQ3yhnawbihvTEvcbOIxRFxE3ETcfMr3JB3cMNauJleUYiKMOsGumg47yQYN6I3ZsRN KG6YTTj4KN/f4Dx73rlYmAYu+wicCJx3gUOvA0e0IyrT6yAZxqHAEaQIBw7vjRmBE4ETgfNYwHl9 MeudKZ96wPHKIhOetYAzveqSYToOOLWhcgLEWBo0P70x+8BBy8Zvmp+5si+o8F2ciu3bbm08F6qr FgIblkQwRDBcAYO/NkzXkd0/2r6D06v2SUt2Q6TFjARDd40NFjKfLZfqcoC0N+YZDPTKEEuBYajn 1GRLBN7Qr7UwC9bCma4sSDAs3lrWfi2/+4bYCdhKWVB3LweJqj25Wr82cwNgNOoQmAOOUIxQfB+K +Jp7RrBsQfGGE6Ym+wAoZhGKXw+KkWORY7/iGOkad0AGIjBrG3c3HLc16irHup+gd+aYihz7ehyL xl2E4mJQpFeh2PF4bziAbPQgFLXBHb7IQg5fZH0SQJTMEEt7Ei9j74tc0te3WeeBIuWcaYi9F/Zb /roqXk8XyRI4CkADi/8iHiIeruABWijXDb4vIuXEtCodkxuOINuRqblAxep9vlvsbH1qrs0L3jQB nxkP51+MKtl+gGLnCIcIh/fgkG+2h5X9OQCHGw4525GFQneFQ9KDg6V8GYcqwiHC4UvAQR5kc0Di muVQnOCA0+lnJFBYjn0qHDpVz6jKsbfhQOwyJxwjHCIcvgwc8Dg4TM+zIzzyRMNd4SAu4JBGtyLC IcIhCA5kHBxuOH+AR+ab7wqH7AIO+EqzhAiHCIcIh/fhQMfBYXoSF+Hr+Yp54aC7cEBYLdQ4JcIh wuGrwCGXxx9j4DA9mYnIItmKLhwI7sNB2uhWRDhEOIzPVuxNudrq63Bg4lzGirPpqUz3DibBYahw qjcERb1fdFWZUlEfC1oMDgnn1szUj5E6QyiBq3c35sfhXJ4F6oyDrtV2ghIBDRTVcZ/bbZkbWbyA QCeNw53yPKyAzKTCXwnsV9ZZjYmC5o4UhV35PQtq3BQTf8N5b4po0gQjasai5k+50+gCNfXP+WIX wiYnRgncj3Stu+PpEzi1Wb+kgbFhDnQENseVT2PqosUeXiBk0nRZ1BBtOJ2r0tTLBtTYfS6Lwuz3 edVnZf9Srjbu34B+e/IUYUXpzBDiWeHlQGGoghJTBjWmOAmVpXxD2WqdgRhk/RnI0Fvb5zBumJEU tyeHJk3tgViDOcrgQLpdy0O+Nxud//2ff8/f5L4pxWWB5xMen114DLsmt60mcEHU8uyyLXYRAlfD MBrZFdkV2fVF2CVX207NyUn5KZNatk74EDa5qzZRfJmaE9wbsx3/cTBLSQwOx/hPhM2HwubdGhba MZQm9wF3ir9MDQvujdmFDU94TFNH2DwWbB4eDu/WsHThMLn61enlMjUsuDdmHw4olsZHODwWHH47 S+TdmpgubPgNsFmmJgb3xuzBhmXR7YmwibD5MNj8sPv19vlKizyi0kx3YiyTq3MJXDE4CJvuxSSU MYUDrjWk2CqpemOeYeNen+bULAeb9pgzpdH9G3JLwn2F+aGUmz007DUHk8vV+i+QDwboDHUK7or0 L7EWqY42X231dmPyws1QyeJ13yy6JKhZCU0wYVfEtrqfsNj9JMLxYeGIr8KxaMNxcnWyA9X1E9Hd T5Duna9Ik8Fqj0aAYQXBuN1iyksUreQZQ2my+IloSjM9lyVWvSG3JPL2da++s3AYEf0cAV1uIeTQ wNzkensCIuS7Ut9WKqyTeSRiJOKnJuJF0z0muEIdIk4uySZwc+UQEbMuEcebi362JOVusqgnseWb 2izDBV2uJLs95ucxFyPFIsU+M8UuuuQ5imnbdnqnt8FyRBloHeo/QXoWEeWD7/kkAKWF5T2Kqa5d 559nsaIomERBUil5MRfF/BuqkQMmmdfrQ4UwxvzaCCrviQyLDPusDNPyIOvIXc83NR2G3VCUrgY6 fcIntE17n0gHL67sRf56EnvnX3zv0CV903pexVyWmLFM18iBVapWW3/X1nq9LZrVEZMFkTmPyxx8 jTlt7y+5oZi8mFSQeRtziisXbS3DnN5TDdkakTmROb8nc8gVX63LnBuKwMdeX3xX5rRj8AgbnRAu Fj3AEu2cyJzInF8wh15jTse3uqEWvJhUC34jczq14AgXJjPqi9k5lPWY82e5OhzMJiInIudhkbMv 9qvcvORNHVajJzCspIifkENvqDBnODHpsr2O/Jj0JBE3WPoA5MwUkr5j0SctNFGw5KvVUJbbMn+R G732Z30FtDQhOmi9gkSG35OIwgU+FiI+hUrjqyrNzh2KKJ9eWsmkWl6lpYoqHVX6N1ZpclWlC3NS acan79KJFuNUuvPWxt856YeQ0qa9Mc8qHe+cHK2LmfbuRp77pbE6mNKXCPqmZv6iSO6E8rCymoiM L4gMehUZFrWQMd0KcCb2/MgwPWQYGpERkRGRMRsyWA8Z/qc59FkhY3qRbWL0VWR0PqHtTcgwvI8M HZERkRGRMRsykkFkiOnVYInFQ8iw+FZk9B0TN2ZERkRGRMZcyODXHBOWtJExvbAisdetjJ5u3YQM 27cybLQyIjIiMuZDhriKDJG2kHFD+NPa+ZFxYWXYiIyIjIiMuyPjTdO8lCvdOCbCdn/OyJh+wRvh VF+Wb7+HjNM/0YOH1HpuVN2g5jxm+5Z5kqhE0UUPDBP4G5qtv4wVXEMS1H99sKxoRt2KoDJoeTLO rRezL16MPq5Nfli9me3x4ARi4+TJ0GtLpILVvivNTpZO2taBbOWlwTUtSRBrGCIJhUPRb/pcN6Xh KTEKLMKao7SLgevOe9NDk2YXeTWWV8TzCg/yKpseSKEiG+ZVrx+WxVkxWHlAkOIpgqPB/k/UG5Nc FlAsyCsonxUFn+1aKf+Gal5hWFZFQaBeI6xM8I61HxF9EX2fCn14LPqmV71ToeZBH63RR6+hT30w +rx5qCL6Ivoi+h4TfWgs+vh09GUjvNQp6BNEM06srf7soi/TffQxxNTyVh+J6Ivoi+h7RPSxkQG6 bHqxEVU2KECHmJZC4OEAXWuIlLEu+tyYnQCd+0SyXICuxhMnpBjS0Kno87Ldkng2G1Ouivy42a2P z+cblbHvTDoE3hj0iwyMDHQ7/v7npgC+/a1HFXPOayZoctAPp1YVyTAD8QUURw7hJluYgrX7PPsx 21CEXyizbNCvmtVcSQqOE1n4DOS7FATNJmEJw1nI6kQKL/OQS4cHB4v9wa1hkEeIgvPMIkyniM0w tC48lD+BiNqsPWbLHFYyJF8hVxp4nyuhGMNhcm2sPK4PV3kWliO2ouqgDZNyWvbtaPYAbp7C7AQO mh4tLC5ok8mtheUWJkZsAftUErSxUOu+FVqxVm+b98YUZK3TbKg3+cWukqQ9bFsSzu3I/8X5/8Pu 1dHqCu3nuhY42UqNSk2L/5P7mmGnq0bOcVK3PYREHdzDmEnHFXc29eJJH1xwMZP7z7hAdWnHhaGZ ynBDk1jhL0DZlVt/4fZZGqyzICDQhDDfv6JeXcC8xKtxYJNtwojuiEHhUiIMQmAA+4C+CgPRKqVP 0OS6WJxl2vCwIjdKM7fYA4YgztxEvTFxzyFO7SINx3pPNfQNTW8O7d7Q/9SXdSjtzCt5yAt/uT3s 6HSo0dnnQYvAwlRM8Gu1qbTDIqw5gZPjFbElB4WLiXAZDZejcyEaO+Oiglam554gHNWngfHTf/3f PwLgQghBBZX4oiczGbA0SDr4nhsUEsQJRTzpjpkVXZYwtsx1azU82Uh7aXpPkDuHtFQGOJBa+1hW 7nyZo2kiWmywy1tHGNfUgBuzc+5kvtuuoRs+0d6JCZIjqCHgTYMvZNamgIejFsjCkqHK5J4kRj2K cz+dOljnncewbvq3+KNruT88vTnWymfzVJqdkQej3W4NX9w+aBbMcgWA+w9cDQC3QK32h1UBBc4M vFgeRighZFrFLHS+3m5f/eVUqpjg4gluEogp6F21DisZQRcgEG6U91vz3Lpvavtn/rY9bkAage9d hAorMKoW0OnJuA+tysAnU5zCG3/bHNyz5Zttbn7sVqVXEAsLO4znCVIYfOrvb05U+SbX+c4tCtg2 /eYQODfmaAObw7Z0q+Hn/qwrWEEcx1EpLNqkMoL9N3Dc5fWe5aNWaVjsBqcOatXLh6fLvcbAGqUg TPDQTV1heMpTYLvNKIhrBN6LQWTm41/6FLuXAlaYDOxRKFiCrI8ntV68AgLj4btT5o11EZQlqanm djBv/po4OLMAk0uH+Dnwv1//vT2WG7nW1ywHgZLGcsAJmVyigFWWjYtRtN9img5v69U/aw7WtL0Q P2bS+0X3EPFCbsmg1k6+ixDekLf8V5sDNK9ksCCSsC1eKycGzNpmHYAayuruwcALambpQqoVMqg3 PzRpdg/kS/xWLUjHRkEdYSZ3KgJchRMGaTW4GDtDNPw4j9mNgjKJl/RNoAUDREHnKoKq3hBkgtzK zMtvzvbyd5NScAFEkOH+0FGPGFCNnBoI0GbcnjlFJ9epo8KqWQO0xNnDbktvB2j9mLjFKey8ehQD tJ8SVTFA+znh8ms3K83OcGHTS4Ew04u7WTBm0vtFdLOim/UIRPitzJfdtjy8yd3JzXqv4BrzhjCh KSAwJZBwC7mfAuoTJhEtZWXuOxgOhHaHyNLuL6Ru/yKVXC7jZrV+wUk2V0eE6g2BweHcq9xfY1Wt 1v0rmDC+UBqbMHeLcC5gqa52+e64f8l3Tt5q85zbUr75DioU+bo2FWTLMCK0yiqxTQ3jszk4oc/1 Kk7DatIeOPnlnDnikzOHYtfkvqpECAnz6ATNKK2TXz/3pzSaj97SwITcI6S/lpI4061Gs0hlkiTW fy+V+kKeBLRMci8ryP8gLDEEMiTWfQbUq8kpEV93G5bVYML4fJddrY3bHfLNthaGU1AJIgOPOPDU +vQZvDenZSUUCFcfhQkmvlQ2TKKuLvP77mBif+YrJ9jXyvpS2aBFzZDgXlSVwdnot/2zf2vCF1WH ndtDGZIAJoh45Z7LG7M/GH9DFwkGE+MpRbDq8v1utfFhtFy9/KXqjRV4cZigkjRPWbOEEo/yR0kp OZHOD28Cvn/rBXyzUzEKFsnEYhTnZnCcZHTQEmnHQdzXwEaYCe8LgDFTk/YmwRe7WvM8Jv79yl6d qZB5D2rlttHL2ZHU76aB1RdFgn2+feNoDfBvMshcBNNnpo0lJaSAOcJdg0ZJx43VZqurvQXsQh1m PzCFCygRUXItN4XJ9ao8/Kx2qxxa7K1Xb24cnW9KHzvT3vIM3XFY8f/sXd2S2zxyfZW5TFVqN/gn ePNd7WWu8gIs/Noqz0haSePP3qcPuklKpEZjCRxK1shIKqmNMm6AJHD6nEajWw3uuIDnaexrjGGT DONzoIaGVyx9lUsySH24mYLvF3NBuITVyVUeBl6BBTnugeu5180mLHdNbNc8zBBv4mS9SkjgQLaN CRz9m0MWQM9efDqK9ymtuiOOvaGKwAIyef6aS+al7SzhpzVdJ8zQ7+q8mzLJnpXv2cOvmmeQJVdr bU+38WqQ8d9bi9+TSccm5DJVgmh6OjMn6vzMnGuRUEUqsIrKJ4EZJuvYFtBAuGReA1PasLp9kUfv 0OI7zLtilcwFwjresrckgDGq+0mGGfCWN0dAumsEjsxF8cnMpbbehkuYy/EPFw/BLeOVIkdjDoMq Sf3vb3C2r/Xzt9abM5JwhZAMXDaX5q83l82b8MM9v24X3/dxCpkXQxE+cthYr8vFj9ZiXG2adcAq tDXgiMnTn9e5fi6CwErBOE3/ZWNeBiKNW4/h2Cw6kNSo4WT/kQZ+w8NrzOQtSgtcQL3f2Ntq/W3W pdwrATzTLrB9JClhFayYWgDzqXPjbnOVpbnSoyptHWyW7wPCYsHVZm46VRMRhl7n/pwORAOGTufd wH01vZQx5UKGNy5l2GMliXEzShugok7fIONiGAxRkfEP3r61WJzOe+ak5QZTkL+7Nua0STTTbGHN xvZ4Prft+ozhcOkcZpPD5DbBfUfRDfxcsN8de+bSqTYq7jZwGaOBlvX7Z8WrFFTm+cBoI2Vd/A/P aknA09C8D8BsWo0tYIcfSMSrKuLth8xg3TznodEGIgYPRSY80rWOLzdrt1glEtCj4B41mKw5oMYB BSfXiyLRxeBu29UaxxzBIhP65seXrcO/GgrORyZkVBA/gQ0cNoc6FQYC5ZkXOq50fiSDNSdmSKZM 8I5yJe46t6EDB3ryemstD5dUqppMBgcT/WXgIKQQnHDK4Spqfb5ImOCSQPYUrzhP/9scjTkCB6pc KOBQwKGAQy44sAvA4QOplYZOYA5p9WcNEUdV1mDMEThYqcht9FMBhwIOjwQO/AJwmF6Ci7B6Ajiw HHBIQ4zBAcYcggPxQcoCDgUcCjhcDA7LuD0EXsfHfTJtsCE4TC/JVbkTXUISOIha+KQKdPpvkZSE T3+RBEIFskIyEcXZ9YP/uP1bI2gyMBrTe/JUKSWVYDGZ10mz0MeJvEKN0tiG/F5Wvi1PCicEkDXo MvOK7jfl6Y5DrsFzfGuwhZKBwDCZNA+VZ4KY4Nvc224qZNpMbgkwPfsYZEJOb31ETKxOAswb9nHA B+XS58vY/UySWNnxmAAwJ394DIDBJ25D93+v+wxswcGTivMVzQrCFIS5BwozRJjJrbOhBt9JhBls f67dIUsZxmfyQ7nWMOYYYWAS8rbN1XDMQmEeE2DSW/P4mE3jV38v9wXSAGfqvNTGAlUzqy01XW0F /i5U9d8d4gpH/4SfTe8aWxy27AYDY+7T1s15HDI0ZyhmvpJCBUALQ3ss2Jtc/ozyKubCHvyTD8Fe MlBgr8Begb0Cex+DvclZzSTKXLaH/+QjsAcGCuwV2CuwV2AvP+I/hL3paczanmZ7x/hw9E8+BHvJ AMDe+C+UeCTYK7BQYOF3syFNpicw23NherjjNW5mLQX8kgULabrjMUdh+s7i48BC4R0FYD4BwGzW 7p/YYMef4h2KDACmyzSYULlAkHiiAdi5PEf6ketTMCY0ADv5F1cGmMMPbSud6wDMXG16/pBaio/f SExMaCQ2R8Ouijjftg97Xiy/NWuz+5qe5hn0uq1AsDufdyw6c6+tikSsofdmdg5Wd26VKqcEzA07 psOV4S7Dx8CD+sw6Yndbr3HulmJMRWOxGbn52azWARa4RDs0a63daWsyZZxU6Jw3PnzvHxDds8ir /6EqFrvssfQl170p7J3I8wpkqcSvYdlvd5ulW/9s4mb10rxukYVyRL7cWlZz9Tm7Yk2Ml4SMYXNS L/FABl3JdEWm0RnGaB1P9TPd6yUckGvm+79g6Wsk6uDPvu/36Ew75ls6Q+Pt9dLZIkNFL/3hPKSe wkNm9jfc+grec/Kqyy1+sOjNz//uSt5mOot5jZkoFF5sSYbQ6++gKCdF0Zx598fUFpZAWCLFbr5u +l2joLZTZlFLbryET9BZgbayG/gAsS18mMm5PoOzWC/ct9f1Jc5Cw39Gn5GtfYmrJzS//pj2TWP+ Lu07GpDpc2rv8WLuDwbxN7NINMJi0xfnbdzq5QWNcdjrmfjPiMdi582w1i+UgROZdetN5eFBsapW elrUHMY+Y2VeANnzVTxG9kSsK7hQisUMEzMPo2qGab/8T3Zvh3mrGRZHXBzxzR3xv1++7DXbsSN2 uh46YjXVERNhTqu2X3hJLiRTZwV8NwRjEEnVo9PzNCY44v1fMAqFJW8dhObMkXNV7KYGofEN/dWW 4d5tzHKLCAl12yGaAuFE5rKKs7az7SwudisL0E0VrtjcwHYhCY9FEkRUeBrwn9UyACzuFskLOEBt rNWo8k7lZu29o7h2rbEmQWNacJ1RaLoA6w24B83siae4Mlg9Hu192az+Rv+EccWY96wzu3WeNoPv +olCLwjWQKA+/Yew3CmB/8H16zDP+3FOmOnr+i9el13in8IYb27HARf5X4MOAYBMaWoYFEc8MZk1 KeZjDLP2e2DKKwtosl6sQ7NYuR14ZgV0QeX1yfgUbGG7/fpeTgwxVvABW9CTY7xVVV3OFvb/pNaX swX8eyuOfnhw2V6Omotrf+e0KvFEAMTnYDbtmRy0c6tQ8uTJYWfR1fVuve28oNFrns9k/Sx8Y26H fp8Hq1xrBxvcLna4wff7GzZmndn/bjbfllAssvasfd245xUWWpdwRHu2ptnNnORi+c9d3K39yXxy Qwd5Xf2l6QlOMroLneTxDx84CE1j3oWTPP8U9+EkhSA6gEJ/9Qe/hu0jaaY3+gP8WiJ9uk1l8s3z avUN48a2TYrK7EMSHCXt29obQhFDTeaUSoZWydAqGVoPmKH1OdQ2JIgD9fUniYRgeyJRczmVSDAm Th2S/4pIaJuWwHlkOAxBOOtrYO/HLAniFzEIGoJu24kun03T+SFAQJYFMUX9fzqWdDrBi+UneAkS OL6XpMYbt1r/bLzZGWzmt+/cDt6GhkxyC2mDHbmF+0ftOTqtkeDGLB+ReLKOqj9Hb7MPBjY5wW+Y Z7Hi3PYWD6bac/ncvpJz9pW/UjHqmTmJoJ5x+LrLsENquFhuAx6FU3S3NM/fzvwO5zanNEcS3GzX iyUWPGjs12QLl3IJmwwMXZntLHyi5/7N2QIPRg7YjpieP+4UDSI7JfB8wGGYgH7gMvsxfRxkItDE JrjQNw+b0HjuCz1gSqB2CrPRYA8F8OZN586piUhVMpPb5m+EmDan7Ca42TXbxRd47BezBREuwZ67 m7anL974l3eShQLpopm4RSWdemOVCG8vy9od/ZOkJzKShWRl9VsD4x/CbbboEO8hVnSdLTp7HOqe 9/xjCYF3LM5YpYkYYdrDsT5ut2vhh2FObF6QJj0qAeK42/xs26Xvc4AlLAuem5Gpa2xB/gu2XOez 5XtNeWYCRcuL36eSUCYx95RVuQGueWOUuq5E3e6pPuxGYK2x82kXR3ZqDVTbPn87igbKvAdMhqx8 Y4iFbEtz1k77FGFFt1kt30v5DYTRgxdX0724jjLYS714//YqE88exe+HoFR5OS6YlcZU+vRftEPc gmjryp7lIpO9OLwh2IWHky0M32Umrd1v0nyiFaLSLc4szXK1fQ4BnlKAJ6ryMurSvNBF9vMa2pOY FMryHCU3OpqBQQD7174aTGbRHW6C72n/6EFdfnbLFbHiS9jtfr7Fiu4IIg6wQk+uUUOUkN7kiXIp A6GZtfFGWAFjyt99BMGY5mfPHidf6gbbaVVskh8y25AYznK7Sjx9G16QBeAJnMva6vd8jMCNFJhs gU7cfTVI5jxm4uZZukKKnWGSQ85x2kz9aVAFbNjmnXAo7TB0/D1uezMY7vS5ZmIgfcihazcIL1zd TaDhGHaqOP6vAexMPPlECKhOwM7h5JNRocwYduCHD8JOVWCnwE6Bnc8OO+oDsFP/ku0wSmqix0cQ 6YcPwk5dYKfAToGdzw471Qdgx56BHcv0GHbghw/Cji2wU2CnwM5nh52JlzkRAvyvYSd9NDeGHfjh g7DjC+wU2Cmw8ylgp9+eMLIxkQxgp/4A7MT8kHL1UdiJBXYK7BTYuUvY+UXpCh6EsD3ssLSVn7oK FvndFvipQle/yl2DRPoPdV2HMUe5aziJ2+WuHSYRr3TqPTNEcEM8XreDndPdN5GQC6oyM2weKsvs 9HUTPuW6SZ3eL143SUiwhUo9XW5BzM6mF1w7316jwstBr9uvzTos/WL5pYkb84I1HCqP8GCy8pBm LFUx4+0BQXSs7MDBbtusPrw5Vuf5VRYqW3W3qg5lLmuL1yPqzOsWd13m8ir3aa6QsTln3amEtq5i HYg9+8XWNT5sAlT+w+JTVWbG7Hw3TWoiOg++XyV3xgSs2X4dMoH34h6JCUBOCxKC7Gu1VF6Q08Lq N/dszyL+eAiqj36QQyZgWLC3EyBtMzkruNBXa+w2a95u0R5Fe9wEcY61xy8QR0/WHjrkXuTv/usD 2iONOdYeMImbag+oJhCkvCLizJPaXzRM0TBFwxQNUzTMAcSKhplJw/zf09PmdblMkIL3AbtfkVAw 8tTxissIxY+4ta/R76lKL0vS/3AvNBtY7jJvJ3S91poHc6Lr9TmqcukVX7juCuX/ydGY8q3F24oj QS8gXKWL2y8wQzJRA2Z06xSwQuI1oyw6BmaQ1h3MkHwrUOxDAeR8a68OAkxA1dwqD/xn6nSdJsP0 aDJk0lyu0us6veftz6U7ronanr7oyAewoqbCCoQ/groEVgYLgddpsV8ec+GVG7UZwTHp+AAWYhS/ oSbquS80tZQZviHYKs+rL431293G7BoHTJPi9fC84slzR3DuGKkqWnV9GnDpg0SDF0arLD4Ndpge 2yH5ZgpWXSFaw+pp0RqoASTphN6E58nDu9GadsxRSyScxI2jNTiJa50Ul2jNHxmtKRGWEmEpEZYS YbnqKfFAsRANXnPPAjh96shANgtgwQc94czmrDcbzvaIBcCYfHhmw0XlHqmLw8z1y5mpTed1O0dp GGxFE/P2omGy6klAX90Jy/vl9i+7RglCZ/oD1OMShBa25f2UINyYhd+t1qe2JQ+2EoNtKZ+63Zl9 lFqTmNWBbI40zjTm+CgVJnHrNE6YxPXSOO9Wqz8Ys76VxflatsxazXA+U/OWMrzPooOztzMzsiLk KHFJBjMlcWmmJ7xKAtQsvvxT1Au8PB7GJ8fDRCVO9zPr/oIyErQlbj6X2445iofhJG4cDxNEalqy l0o8rGQvldhaia3NtJdKbO2zxtaSiOfqwChEG1vj+bE1WD6XxdbG/tKefUv9bHEI7/XRmKPYGqGG ukeKrc0or0qY7nHDdH6gGJ6e/jXe4YzvizwwWrW5PxOKPNAYa8muceNhv6G5C05QPR5T+OMf7G01 QzurK+1wrqg0bVTnS1iGzcI1r8v18+uX5N6+L1zoyVBeA0IemYvAEaCTK7aZgK7um8UaW7riRhVV Xg7J/Ko+6rQc/4I+rMnSJvz7NWwRj7Bns6zzEp2uQoq4IgIrt/ef5miqtI2+xbxMJcUSVAIOvNqX xa6xi1WfN2ozezIkptp2q0w2GrgG1/fZxRtwNI+nchGpNW3mU9PNbbFKRLqbISpPllUTPpmUtj5h EpSHRpTPy80i3InOHPowaNeKdVG+t93yJIEkKynz7v5JIrEjBZj9/tKyuL6VMtjzeTgvTM1DGwUY WoK3l+exRfqkMC80k/yF+9q2eAYRSKu2lXLeioGpiTdTI1OmFvv+JHtLMC2GtjKXMdXOHCZlDTQZ TIr3OTSL5crjgvbwHbjNa9AsoqraCDB8WYBCC/wJ5okR5dyGyJr62IqGZdLPtp0dfgwHfRyYzdOF mjG8v3p46r1BbBHl88K3wlIFoQNrns3ShcYvNhDvgQ/TbNI2eV6k7Rd8s4TQRPJr+QgBETunBnAI Db4b+xpjUmL+oH4CbkKfBxVSadVlmu8tVQRcn+F52k7UERvc9l3Mm7hZvaT/FdIrQNjmiD3C5eaw e9ujBD74MC6A6ymzpY7SArc3xNNgRe1jDNBwmJ6vjXUL/zdPX2VeB0Jlf0jxn7BBf+IwPTdktjfX 1vaXmPsXZjAokwdgn0C2H5N6HBHvtXAtRNADUj+9H7lJLvotqWfjvxDVgYLL2mvizobQh7dwtCRC H1n0R2MyXRVSf5Y4JlY/gN83NgHWWBZuFKFQhEIRCkUofBKhUFn3a6Hg6qITik4ohL4Q+s9N6Cc2 noDyPGkNniL0RzF5OluUvh1T+OMfSpS+EPpC6AuhL4S+EPoS+S+M/tNG/pmEeuvJ7OvyxawTThhc 77XDtLOsjNWiOYrmuNvMoKntX2gI1F6iOWbMDMIxh5lB7Q9FcxR9UPRB0QdFHzyIPigB/yIP7l8e SBaP5QGvdZEHRR48jjzQdOKRBFWWy8suDnT/hAinuZXqUnmw//vxmKOLA3jEctPmLEUexCIPijwo 8qDIg5GhIg+KPPiz5IHwb+WBKPKgyINPJw9wxJMZS3riveJE1QWnN85YwjGHGUvtD+X0oGQsFclx Ncnh5EnJUdO7kRxen5YcSOaL5CiSo2QsFc3xGTSHcEyLvw5l1folL+Fb0fMdmIrqKKrjU6iOqRef Ew8SN65m1I45Uh34Q1EdRXUU1VEOOspBR1EdRXUU1fF5VYekUso3Jx1YVbecdBTN8SiaY+rdbKIr e7rY0ruJUDwwRcV5DDr++/GYJRGqaI6iOYrmOJgsmqNojqI5iuZ4CM2h6jeaw7OiOYrmuHfNsfbx +XXfmuXo8gVn1V5zcKLav2FP//W//8jQHMTVNHrjj08xxBXPOboxNTn6QRXN8Qj64Cq6qGiOD2kO LtMGg9wlpFTJ9Sc+b9br55+wbhTYY3WuL0ykvvOFrb3Ds3L4KiqPzPNaYce/H8+rL431aVWbXeOg Ax2F6fGsT8y50Zr21sC14v7AVP+QOS9bB7W3hLoFXXWL8gYdmMt8dcTGMHh1kIMBbw370OVpP24j Y/1nhRnG1QY3mYTVltcPl0OzTdurvvSMadHBtFCwsMxS+Mpx2jOlzr0mJbnGrocU2h6yPHKoDJX1 nsODIaBxCvi7zWsJKWxVwyvrfGu32WlmY0lhBYHl/7dtvr2uvUH2gABMSR4azTUf7fzICsUeevl2 eKDHz0UmPFYi6wq8wLfd1441CpAXVV4nQ2LSTk5W3NfFs2+So+sUOc1zcmkyTI8mQybNpQ5HcyGX zOXM/9ttVstDYHnUfC+JF7tv5yuohGQWWqvsAjxE6dq644r655NZktY9NwTa6Yfwh+693Zh6nLsi xAlKdgWS15FYcSFVndp8T3lHYbesEwPru9yhN9V5OvA6DTtVEET1s+vWPWfgZbjJW/pKy+AHOrUz dj8ylcmK4JdIM3wBnb/+sooR8xmxCBvPQ0GlXd1LzO5ZDbw3nxd6UDoG0inMHgTvS2AO236ewR4Q mwhBmdhjhKr8ZY0/j/7vsy97MNvWwHhMPiw4bCtR3eZQK1MmT8Weu+zWeYUum1dc/ZvX5T/WZrPb nlz9Tg9Wf/3UbYLs1c90NPTSI932mZNCUOc9bz8EV9Qz7sh4zNodj/k7Vj89J7amhlfwDY3kZ7ds EWAzW8MqbvAkoz2qabufb0Jc4EoLGnV7ZgN0FTGwu00G/etzwL7Wq9ddfx6Xef7DtcUjBzgrhOds ktJ+xVgKPm3egYPyPPCeFqxX+JAMDi50HmRUPLAOMvYNsnkE9iNkXiS2Ehy1f4PT6T4kIkWe275C h/LZLYpa0vYob/PSf8UKXlrIPC2JCk9L/rNatuepi+1u4bYYFAOKkSeQrnXLZN5JGiYjvrp2i/aH lxiEYSFPn1YiyQPY84mENqMm79SC40tg9jvOSeZ16wmHSGyPUhGJml34sYMPgafuVGPiXczDokpI EjtSu39nmMJHVd6jXtGzE9rAyWz6FK+bf7/x7J4cPLviT52Dz/PsRNYy1rfltd2YI15LmLeF1xZe O+K1INJPqzoerK4Hq18+dZsgd/WHwOsTvPb9mm2UJTwSFyzN9wy0Y9aHEFM3CXHL61HdJM7SwYmr P+lWIuhp4qhNPnFksTJAHNeblQvb7cAaLNg8xvEn8D0lOJ4bAfNps47gITHifFbKHFuSGNZvNQUG 5c1mY+DArYJTi7wInBAh4oHK63LxA458g3lJLMN9f9l+AWxUgGmS5KXuKM8cRLncczCbFmzhnKzC aP8DktFCH3O957xR1U/BHLejkOgvfad66lxoru+MsvaXxYQyOdcvfCeMOfKdOInb+c5+fSpHzp1K T2aOYBtwI8mbBpZrm40K4Q08Dc+Ci7sO4cxq7E/w61ewSLRolTXaSqjtVi8vaIzD8sjD/2TOMz80 h24OvkHeqiWm8vCg20RdmvS06EuMbRNMgHfIPHsi1hU/4h3bsPQt72Aac3PyFpwgynDSz3GYQusx izOPeszsiP+fvXNrbh1HDvBf8WOqprYWd5B5SFUqedlKJdma/QEsEBdba1nS6HIu+fVBg5REUpRJ UKYsn4Ef1js6coMX9IdGd6N7pvDkxy19XzBrNYzYc1KOIywnZq2WkpU4sz1Zq+1v0PP2k5VIQNbO 2LAKISzTGcs6EvOLFAmZslaHpKaTcilrNZ2UGxXWTCflfsWTcrOUISfE5cGQMnYZdifbY6pzUNq4 XcCjH5KjEl4rSHrerg+r+thatR/L4Z3oONfGo6XdEnv9/ibcXkq/He3wWVq7uerwoeJorQpCwflD EItOv1Uuc7kt56gld83hU48pPj0JKMuxG3pDUx0+VOXBeLGr4P8oXraBgseDAXHHH7w0w0EHaymn MLkLR25RnL9eSCazavlYqdU6TLLgqIeJH2ViwHWVtnFdTXk8JCdFLm1ep51qCAQz8nBUJxIHLqps 5SXz29HWjep4v/8HbUkp/leU1f/8l788/fe///5f/nfzC/7veefv//Fz9/sfT/73y/r70z/A6Br4 g4F/HvyBcMa1f9styqU3DwaH2Kvd68Uf//0/TmL2fjXxvzcL8+TU/sVuK3g+/fQLzLP/r/XS2O3A KIvVYn/FH841Uic8PkG9m/o/LvHYe/01qvxGWyh5sz+8d4hreKzGbPjD64t43x9+9S4i8di+iFYs uW+Ifjz2Xsz0WPKguIhY8rCssT7nEZJG+ZxH3F3k7n9QIqU4y2i96VpA3lW5Whf7deH/L5jowXYZ nl5tkarEIev1fBgR/B8LHf57o4I3hcLxOmyiJEdFcEdIy02wG69GcLHEFxHcTxMrpcqq7Zkpluv1 a1gLK8cHinrlRFiNUTWtT4IEOMywihMkheXg7DGbKuBUXUwZJwNpU92WX0xe/QPbv/hZvYSMhjLs wbRRcQJLQWGOvK32/rKKFUzlzWIb/DEw5zIWeX0OwT1eXJ223TDAGAzEhAHufq+ixCpkg2yKlf0e FAxSy8HD4eIoPDaSMCxprLk1wfqpN3rmaDScR+UsFwQ1jAZyMhouIgDvGQ3e5i1J7shg3YpMNMPP nvIoH22XUJ3nsnGwIoxpL2plIHpHo6GqlWH0TEZD9YT8rNhvf8Ji1YhGhgI3VMRBiGZCyPPWH/b+ MgCRxAFxlCdieAEFLxeIUXqzKIxfl711W8CSty7/WalS8EzoyHvkjrXvEU27xWEHxwRlfFs8bxVY MX9FFxZ86ShuKGPdoqsvHDeojMaqMUVkpihjrQeKN/SgHpN/qjJiRBnDNv8aylg9wLBIrsLMD6G3 4GUXceY7YSa4PU9Tqzh59nBOgtt+6JF0BWpa9gtEE8SN8X7eizljvJ+fB4fX3drtF9s/zCUcmNCc NuDAboCDczgODjdo7mlMq0eo8gxwCPdCLBaO6dzjYS44aBmOQJ5eYT0l8siNr66CU00xIeM5Xs5H XE5S3bGq+13t9YtZPwfNPavuWZvOqsunqy6l7sLv9r7qMm/xDa8n7SEIan9gy+6YLTt+1nU9YojJ njl4QrCuh2lVbO3+sD06mcygSdT2iVARstqPs6GWErmgeyklaUvhdJKYD7iYhIB40x4PmfbiBgTI +5v2fszPNu0FMTSZ9sm0/6pwOJv2F3BgUmnSgMM7abhDcOBmnGnf/eAW056bzzXtuXBEUiJlMu2T 6s5q2uNe0765rmc3qK6b3bT3Q7RNe+6SaZ9M+4SACNOeXObd4Jw3EJBPR0D2CaZ99vmmPcO8TKZ9 Mu2/KhzOpn0fHM5lO5+eMLoBDvn9vfZZ/slee5WhLOeSzwaHZNr/qVX3ZNqTXtO+sSvH+AbVNbOb 9n6ItmmfmWTaJ9M+ISDCtKen7LjQipUba6GQ7xkBZDoCSjzCtM/Ih5r2fszPNe1RKY1LXvtk2n9Z OJxNe3rhtefONPb9+IZsvVLe37Qv5eea9oZbwyTWg6cDkmmfVPcm0572mvaNXDp8Qy5dmc9u2pd5 x7Qv82TaJ9M+IWAQAfabXe13f62dbl3HnMkaCJiek8cwh5V0HAKqP3FYZHjQZOv8PemMKS+7g91t 9fY/x3+YafWm3Aoo6FeVEjB2qX5Cc+z19rUIB0AVHPsqI88DcuzgCGhV2cZP19M8cyLy4pwAQXA5 dnu2xRWUjXFxR+WmFJIa9fhCbaXuFaIpF5hwE4UbfAU3zc2CmG5xMBmLm5JgHY0b3BnzEjd3sTi+ MG6Y4wKMhm9vcMC0aPXvpJHTPgEnAecqcMgV4LAGcKbnFCIda99kCAt0E3BgzAScBJwEnIcEDu0H jm76VKZnQjKMo4ADhXt4dhtwYMwEnAScBJzHAs7ri11u7PapA5ygLIqLvAGc6XmXDNNxwKl+yRMg xtLg+NMZswscdF8PzvFnrvgL0qGOk16/bZY2cKFqoRBZsiSBIYGhBwyhHZipfbvd41Sq4dplN/ha 7EgwtOeYGqW1jf4BWWfMMxhozxD3AsNQ1anJlgg8ofe1MI/WwgldDYZlOsIxTN5a1m6pvoWa2Rxs pTyuvlfiWOLYOxzDvRwzosGxG46F2jzShUMYHmEadIZoR6n9B780xxJxEnG+MnFITZxaZcEWIRKz Rr4ru+E0qy17idMEjEAsu4k4ZZc4ZSLOZxLHq3PVpaVcvh6vDVT7OHEjM90SvhK+ruOL9uKrma7P bjiJa837+ArfwC0SKK3weHyRUuUt/gWJly7od9vZfS2PEBWCGXBBa/dH8brQr6c+qQRy4mlkFlzC Q8JDDx6glnBd6frCYUxswy/EbziL60ZGqCIVq/P9tnXjQoSqyQuBs7tYN+cPRuUuP0DWb4JDgsM1 OBSr9X7hfg7A4YbTvm5kvsyHwoF34OCouM/WJ8EhweGXgIPaq+NJgT7LQZ/ggLPphwVQXKh5Khxa 6b+oCjU34UDcfY76JTgkOPwycMDj4DA93IzwyNT+D4WDvIBDlrYVCQ4JDlFwIOPgcEMaPh4Zw/1Q OOQXcMA9VQMSHBIcEhyuw4GOg8P0cCvC/fGKeeFg2nBAuLxTBZEEhwSHXwUOhTr8GAOH6cFMRO4S rWjDgeAuHJRL24oEhwSH8dGKnd0u1qYfDkyqMxzy6aFM/wwmwWGwLXx7CIo6H7RVmVJZn465Gxy4 EM7OVJiQekOIQw/aFTSaP6VngTrjqP7SXhCXUEmwPOwKt94WVukXEOilidAJPi5J3mYy9MYNM+us xqSEKocUxfW+ngU1/hJ5aPXduUQ06QITasai5rvaGNRFTdW5jOXmhBrCJgdGCTQKGt+ciCquBZQI JIPPufXXx1O7pzFNs2W4tFzJe9shxFhB5zpNE2QDatyuUFrb3a6o6hvtXraLlf8b0O9AHh2XPs4s IYEVQQ4khpaQYsogxxTzWFllqKxazTMQg1w4ChjbvnwO44ZZRXHz4tCkS3sg1mCBcjiX7ZZqX+zs yhR/+/vfije1O6bissiTBI/PLnxhJh0Z1mDX5PrNBDol9bGrofUSd9jlDTU8mNP4Lrtcg12ECME5 p4ldiV2JXb8Iu9Ri3co5OSk/ZcqoxlkcwiaXlyaluE/OCe6M2fT/eJhlJDmHk/8nweZTYXM1h4W2 DKXJBbG94t8nhwV3xmzDRnCRwtQJNo8Fm4eHw9UcljYcJme/er28Tw4L7ozZhQNKqfEJDo8Fhz+d JXI1J6YNG3EDbO6TE4M7Y3Zgw/K07UmwSbD5NNj8cLvl+vlcKa6hnFluWj6Wydm5BHrtDcKm3aGD MlbiUf39oJNejpV2mKHOmA3YKM5zqe8X22qOOVMYPTwhPyX8Kyz2W7XaQd1au7eFWix/g3gwQGeo YG5bJMeElbXI8uCKxdqsV7ZZpoSlMiWJYg9LMdxLMd2k2OQ0Yk+U/qPL7W+QJhAozakYfM4nASjT 7hh/P0tsm0xwP9ldKZbrMhN6LoqFJ1Qjx7/DIuj1vkIYY2FuRMVrEsMSw740wy4qzzEpStRi2ORs ZwLdEYcYlncYFmOJYUxxbhlBHYlNS8yWmSb38zE1x0yWWKJYotg9KHZRgM5TzLjmfnJ6hSlPlKH6 mfCNFsUIQYYMJsgerxYTnCvZrDweJJ4pRgg21N7dU+6tJTOX86p6Qn5KFM1W8aEnQRy6wjUCusCc g9YntjDrQvurKpWGOY+zUIkzrgcKpUrQ7Ci2wqGXlYFCqlT6N6HwEVFo1F7VvrXOptS2UHhD2ng5 UIsTvmFc1vlGNtjSteOb60j8rO4sxw/qQfVcBp11zNTWF8zScrEOTaGWy7U+zo7kzk/MeVzm4D7m NDeR/IZ0bz0qZdJ9KHP0p3WESsxJzEnMGcEc0rPlazPnhjTtcX12yccy57LPbmJOYk5izuMwh/Yx p7W3uiFbW/dna5++ISRshU5eIoLzDMlsMODfYQ4inTHbhAidE+7BnA5Kh9zLNzCHsg5zvm8X+71d JeQk5DwscnZ6tyjsS9HpqXlMyxQn5NAbcsAZ5ja7bzWiMCY9ScRHLH0CcmaKz31gWibVhpQw5avZ sN2ut8WLWpllOI0roegIMVHzFSQyfE0iihf4WIj4EiqNe1WanWsIUTE9+ZGp8v4q7cdMKp1U+s+r 0qRXpbU9qTQT01dpbuQ4lW49tfGdr8MQSrmsM+ZZpX/B/o0zdb7WuQnbjaIIU2Oxt9sQkQ5lx0Ir R+GFirgQd0LGL4gM2osMhxrImG4FeBN7fmTYDjIsTchIyEjImA0ZrIOM8HM8llkhY3quLremFxmt bxh3EzKs6CLDJGQkZCRkzIYMPogMOT0bjDs8hAyHb0VGd2Pix0zISMhIyJgLGaJvY8J4ExnTEyu4 67cyOrp1EzJc18pwycpIyEjImA8ZshcZMmsg4wb3p3PzI+PCynAJGQkZCRkfjow3Q4utWpjjxkS6 9s8ZGdNbsBFBe46MXEPG6U/MYNWBzjYqI50xWweRecnL+1UIry+C+fuc69yxk8JAEDS8PphWNKd+ RlAVNT2ZEC6I2ekXaw5LW+wXb3Z92MNRPQvH62Ibi6gSZvtmazdq66WtPcgWQRo0UuFRrGGIcApn q9/MOW/KwF1iFJmENUdqF4Otu+hcHpp0dYlXY3lFAq86SRtVNxbe3BXl0x0pVObjeXXURpzrwcwD gkqRIaj1En6jzpjkMoHijryC9FmpxWyNn8ITqnmFYVppTSBfIy5N8ANzPxL6Evq+FPpwC33XTbV8 etY7leUw+jrF+kahj9boo33oKz8ZfcE8LBP6EvoS+h4TfWgs+sR09OUjdqlT0CeJYYI4V/1uoy83 XfQxxMr7W30koS+hL6HvEdHHWg665oZX45w30Dc92YiWLs5B5ywqkR3HpXqIjLE2+vyYLQed/8bd W/hRQYge0tCp6Auy/ZR4tiu7XejisNosD8/nnsc4FMIaAm9y+iUGJgb6FX/3c6WBb//ZoYo9xzU5 muz0w5krNY9gIGKuKs06noEs44hlnTG7Z79IJu9q/lFtNcNzBSkE5kqHCORVCoJmk7iA4Sxk9SJl kLkvlMeDh8Vu7+cwyCOkhPPMMk6niMsxVHHdb38CEY1dBsxuC5jJEHyFWGlkx1VCMYbD5MY6dVju e3kWFyN2sirYCBflteyPg90BuEUGVydx1OVR7bCmx0huLaxwcGHEaVineNTCQp1/K7RirVkfnxsr IWqd5YMNhLurCs862HYkntuJ/3fn/w+3Kw/OVGi/sIEtb/B/cl0z7HXVqjlO6jaHUE0juhqTt7bi WKO7B32wFnKm7T8TEtWpHReGZlW8Nc7QJE6qMhia69AS+ywN5lkUECgnLNSvqGcXMI8HNY6s1U0Y MS0xKF5KgkEMDGAdMH0w8NaUaMBgcl4sznNjRWQEmOZ+skcMQby5iTpj4s6GOHN36fbRuauhNzS9 24d/Qv9WF3EujTev1L7Qof08rOh0qNDZ10GLxNJWTAhz9Zhph2VccQIvJyhiQw6KF5PgMhouB7+F ONoZFxm0KjvXBBGoPg2Mn/7lf/43Ai6EEKSpwhc1mbvFCy/3hYPP+YhCggShSPD2mHmncxBj9+m+ WMOTjbSXptcE+WCXVpkDDpQxwZdV+L3MwR49WmywsmRLmDDUwjZm47eTxWa9hL4gxIRNTJQcSS2B 3TTshezSarg56oAsjA9lJnckMRpQXITLqZ11YfMY11jklv3oUu32T2+eterZPm3txqq9NX61hhe3 i7oK5kQJgPs/6JICTQcWu/1CQ4Izg12siCOUlCqrfBamWK7Xr6EXQqknbPGksBx8CmZTzcNKRlRH KyJsGfatReH8m1p/L97WhxVII/DeZawwjVE1gU53JoJrVUXeWSkoPPG31d7fW7FaF/bHZrENCuJg YsfxnKMSw57625sXtX1Ty2LjJwUsm2FxiLw25mkDi8N662fDz91ZV3AJfhxPpThvU5kTHN7AYVPU a1bwWmVxvhuceahVDx/urggaA3OUgjApYhf1EsNdnhzbTUaBXyOyRRBRefB/mZPvXkmYYSqyRqFk HLngT2o8+BIIjIeb4c3r6yIo55mtrm1v30JXEjizABeXDfFz4J9f/7k+bFdqafosB4lOPgrMyeQU BVzm+TgfRfMpZtnwsl792fFgTXMXEsbknQ/ah4jvtC0Z1NrJrW/gCQXLf7HaQ/FKBhOCxy3xpvRi wKw9zgNQQ1W1uons1TVLFVJTIos614cmXd0D7SX+VCVI3/OCKmXyBmEmVyoCXM3sBYUhjvw4j3kR eat7x89MmPoiKi9oPlcSFESCXGVtv61NFQSCRRGWVx0XxXhkp0fypyZMDfhnLSVnTNHJaepIu3JW /yyBrHStm92Yw5i4gYzQAzD5Z78kqpJ/9mvC5f1dVtawgdj0TCDMzN13WTAm73yQdllpl/UIRPhT mS+b9Xb/pjanXda1oyZYHAkTGwECUwJJP5G7EaAuYXgzEZD5dzDsB20PkWftD5RpfpApoe4TAWp8 IEg+V0GE6gmBwbHWr0XoYlXN1t0rmDAhTxrbqDAJI0JImKqLTbE57F6KjZe3WD0XbqveQgEVikJa WxllyzAiTZlXYo8pjM9274U+17M4i0tJe+DYl9/MkRCb2evNMfRVxUFI3I5O0pzSOvb1c3eKogXn LY2Mxz1C9OteEmdqajSLVKYId+G9VOoLYRLQMiWCrKj9B2HcEgiQOP8dUK9jSImEtNu4oAaTNoS7 3GJp/epQrNa1MJyBShAVecJBZC5Ez+C5eS3bQn5w9VW4QB4yZeMkmqqX3zcPE/ezWHjBIVU2ZMpG TWqGpAiiqgDOyrztnsNTkyGnOu7YHsqRAjAtAcmByyu729vQoItEg4mJjCKYdcVus1gVQWj58ltV Giuyb5ikihzvsmYJJQHljxJR8iL9Pvzo7/396Wl7WK384vO095ZX/SkYIpJHpqK8NsT2pNPSs2RB p5o4OC9NaQdNnFscybQkVArUGbNp8/jFOZyvOL+vL974BpYHlJ+PFATkhiA+5WDYUxE1ez/YbpjB AIOTZVxdniwr7A+9POwW305WCY+zmJhxFPY0h9XiRyXRrbfevAuO+RyWLhW32sxz1oxZFsoChss0 z97qbCDZW51h8yUj1x5F0ekl1e0X4TEaeIw2bl0UGQsTqD6ccZYFayzGUSdwZrIpSKYtOdmNHn0w Y3IG2pLHWtkfdQZ9plsVWalBWb653elNlOAciVQ6kSNWL2onOQxmh3iYdAtY+5tr2NVtupxetxBT b0JerFDNgup+7VWtgg6Y5f4dRGSBwxAStT8w5aXEX2cN++hFh5dUhXyjb7qyMLd2adUO5qyrnPGx PVY/cPPLtQ6pY3BxW6u//RZMaJBDPnunSbkW1R5YbyHzsoD+tKd7DXmTmMetga50mNTWfvDMIht8 n3EvgJR+NlbAtj/CzlxKF1IdI03zj/F+utIi1rgpNOGW5nJWbjd6sfZGwJGCJ2oQnlOgxpmCk4tD IKddbyPsmy35CwHtMVtYJCy7u7OyWvBno+DHGROpBXaKlV6FA+49y5Lzc0aqzNFkOChnxsGBccYo opjCuZN8uCIIoxxBrJRKSv3/qs6YLThgoW2CQ4JDgkMsHMgIONyQSKHwBMvBz/6oIRzOOmO24FBy ge6zf0pwSHD4leBAR8Bher0NRPIJcCAxcPBDtOEAYzbhgIzlPMEhwSHBYTQcVm53drx2oodE5k04 TK+/IXVPSfB+OPgNgoRtBSfMscH547+f1d9VDDPTHtMY9CSF4IIR5zcqmd+z3OUYSve25oLDw2Zk P7Kj1BoanhpMfC/AkpDwEcfSDwKDNVV+TH0paNqVfAYWVAML0499KCfHYeGs1UL71xehs4QjJ8v2 mICF3g9+CSxUd1w53L9vjllSjML6x4aLjiTCJMJ8NmFUZkWTMJO7W0KZnF7CNNSfZvp8chTGJ3yE +v8/e9fW4zaOrP+KgH3J4GwyvEmiAuxgO5PkbIDpTJAMzi72RaBEqmO0b+NLNtlff1glyZbc7pYp y467w36YSZRWkZLIqu+rKlbdr6JgzLaGgUmE5+1/gmOeLm3pBzj/esEKxr41jY+Zpnr2n+mmhgno mcTt7IZXVQOoqiZHivpzJMPvVVX1dwdvwM4tvDMpqy2x2VUTBLSxD2NEZ+bpgKEhHSj4akCUWt6m iz8xXxnS9iBrL3bLyPIK1CO0J6X2elcooTwuXNUe3HKU2rMCvNrzas+rPa/2jlN7vXORSRG6oj28 5Ri1BwK82vNqz6s9r/aOCwT2Tz6W2X60t6sfdm45Su1ZAaD22r8Riaek9rxa8Grhe6MhSfqnHWdd bno4mdXuNxkKuOKkFihh7TFbbvpK4tNRCx53eAXzCBTMYp6/wBr4uqVgquzEiDQUTJVp0KN8gSDF nh4dXdmJ9JhDTzAm9OjY+xsnVjDbC2W1+9MomKEq6f8g9Y6efq8P0aPXxxA9NWKS67LDx3g0vU3n avXZPs0Y+HoWA2HPtVtYdOB2GDEpsM7NndnlsLodazSIPBIwN2xqCgd9qwwfBQ+q3SZ2uTWVhu76 waKi7Gk9Ud/S2dzAAg9RDnVaaxfaPSRSeRihcV5o86V+QDTPwq1qRxSzosoes19yXovC9kbcrdtr ZPE1LPvlajHN59/SYjGbpOslolCOms9xmw/WiuSElSwmVjOaxX43SibyBl+KST84wxhNin0txzZ8 CQfkkun6N5j9GhY66M73fR+cKce8C2docX6+1FkayPOlHxyHJH1wyMD2hmc6hvdsrep0iR+s0Orb /1Rl6RyNxbDCVCEiPI5iBaHVX0FRRoqk2fHEjkoyWAJmihA7/byod00EFZncypdZaTqET1BJgc5v C/gAVVtvR8z1GIzFfJTfrueHGAsJf0ab4cx9SZ706E95HPe1Y34v7tsasLvL5tPzuT8xFX82iUSi WkzrMoxpPptMUBiHve6o/xnRWJA0bVZ1hOJtwrG2rIo1PCjWwrJPi5xDZWNgVhyUbHftjZY8USQx HAPFEoQWmZtWDUImsTOIo49k0BqE3hB7Q3x2Q/zn5GbD2XYNcaLDpiGO+hpiItR+1vaAleQiZFEn ga+GYAw8qbIVPbdj1o2iyz5IFMpBntsJzVlOumrP9XVC4xv6BZv2pKuFmi5RQ6bZugBvCrgTWe5U UrWcbSVxtJploLpphCvW1bHtQYIHCQ/5TBON0b3a9Vq6yIqFMePREk1eDJ5Ykbv5Tk8ldtC23BGX edkcILWq3G6QqsT0dKYBOkjAStSxz87QraLtjtNVY7FUjxYshWiA/YOZriKBf8jrxe5mYjknDCvp o45ZT6vsQuzdLV37NOYFrySB7x3Vn50aet5RaSnHchXDwRKWcw1zy9cLeGdpUeos2C3wObrNTnvF 6CgDlTUfzU06muUrMP8RYJLIScE/DkiyXH6+L/GGqEzwBiSRvR3JcRwfDkk2tyTycEiCv5+JnQuX 4RvoYjmXURLZx7OfDH7gkbZgFBTi2KhFaZuhr0uMvMqNc+cZmrq6t07ZlEFiyK87XfaEdn3Qpj9D G/TLjN5yKXPY4NlohRt8s79hYyaOjXAGs21WixWsDOhbmDueYQ32EOLAneXOzmYkR9MXq2I13+1O XCaPKdpIHqtPZvcwkkV+oJHcvXBEtNWOeRFGsvspLsNICkGkAc6z1lu7hn2kqKM1+gHsmgV9ssyX 0ul4NrtF53RWZl45tigxOSXl29oIQhJDleOUfBqYTwPzaWBPMA3scbBtyEIH6Lu3E7kSbAMkEh72 BRKMiX2R+IeAhMzsEujWDNshCGd1eezNmD4L/SAEQY2RCcqZjlVa2SHQgMxJxXj2/+hQ0v4sMuae RSaI4fheLBtP89n8W6rVSmGfv00LV7A21DiCW8hNrMAtHHIqg/U0QYBbONkIi5NlEdXB+jJ60ZDJ sQG8Y7PamPOslrgVVQb/XVtODtlg9kR1qgfGJIJqxuHrTs0KoeFoujQYb6dobqmbvR26Se/l9vz1 bpNj0M5IW3iu78QWQkvo4i3aEf2T1POIGuGcd9jtcGhmuW+xzGZMXTTSHahFE1zIs7tNaNH1hZ5g 3qHMo00PbgPWPK3MOVUFQhXHDLrheyTazRlWE1ys0uXoBh57opZAwkOQl19MR9SJVnpyT0ZSFobR douGtO+xWCJ0dlhqcOsWyyccMpLCOJN3BbQvmPNs0aa+B1/Rabbo4H6oS97zT4sI3Jf0M1wpKKKE KoNjtd9uVaofhom3bk6aYdvHA1rG7uQPoOXEHS1fal41E0haJnqTSkJZiAmuLHZ1cA3ro5RJLJJy T9VuNwJrjXWnXezISSRA7Wx8u+MNDN0e0ArKwjuCmHGWNGSBtkfhVswXs+l9ecWZyLZJPEnU34rL IjSZC9BmWQQfovMtPQADYMzou8cncyY6Ewr6WvHLTam3eEDEslQQUzWdLcfGgBoUYEJit1Q4Oy+0 bfW8mvJCTMFkbhaOK1mohkDQ0uu6VoxjSR6ujK7xeutBc/e0lBNu8huzWn27u8mr2EFBt5tc9q5g QyIRauXGpsPQEOpYOa+1yWHM8HvHDhiTvDNo2PvIN8i2q2JhDYhaGgtNpsuZBdhLM0HzjaGz3Gmr X7L/n6tQYJYEWt/8s0IUpjGF1k3SCXLjFAs5JAvbzVSHcWKAsZlbaCKSOfp8vxTLWgz6KbWrmMKQ 2ldQtRCEFx5djIdgV+3ERfunoXZ6hixRBcR71M42ZMmoiFRb7cCFI9VO7NWOVzte7Tx2tRMdoXaS B9EOoyQhsh07sBeOVDuJVzte7Xi189jVTnyE2sk61E7GZFvtwIUj1U7m1Y5XO17tPHa10/MUJqoA /bDasR8tb6sduHCk2tFe7Xi149XOo1A79faEkZUqSEPtJEeoncLdpRwfq3YKr3a82vFq5yLVzm7N ifvQDrNbOahKT7j3YuD7ymA9lHQGGfBH9WSHMVtJZziJ8yWdbSdRdCnP3mpnoLyYYVUNV0TjeTvY gdWBkxCSQSPHFJsnlWa2/7wJ73PeJLHvF8+bWI2yhFI9VY5C4ZxOL7jMdXmOCk8HrZef07mZ6tH0 Ji0WaoJFHGKNakY5JSINWKtiwOMDgsgizhqGelmm9eHRscTNPjMTZ3F1rGpbTDPJ8HxE4nje4qKL aZ7kQM0JUjaHLDxltXYes0qJjfVomafaLAzUF8TqU7FjyuxwR00SIioksFklF4YoMrX8fCCigNwY BBbO52ppeEBuDEvuHLTt1PjtIajcufAdiUzZsi4TXMgTFejgilMVV6u+TpHNGRiW3K0M3OCYIhR5 tMs4kFc5Mg7PXX5M7iJ7cxdpXE/wVz9HcBc7Zpu7wCTOyl2gjIAJw1NpGs9dPHfx3MVzF89dPHe5 JETR5i4fg2Cxnk6tSsGDgNVVBBSMBBWuOAxQfC2W2brQG6hS0xGCx+9N2JBcZe726KktJTdqT0/t Lqhy8NleIuPIZE0+gmOGdyWelxQJegDgenrH73nIRAI7vVpdsMNDPFzkBKJADIKxrRjiLgVqc0Sg KG7Lk36wuaHIbeymsgfqfm0nw2RrMqTXXE7S/9q+5+W3ab5bwrQ8Ishog7ewqK8yAGeFifYog3ov M0Gs0cnbTTsSu9gP9pA0BWzHpO2wKxwF/g4lTE/UA6x8Q7BVxrObNNPL1UKt0hzwIcXT3G61joc+ KP0DdLPkYUzjqkEDbiKgaPCENHbC0yCHybYc4i7Ga72e3pqHsFXi5qzZdTg31CmR4DbZSOY0qAZw Lm3EjDayhxuo04Y2Z4sC2mPyphuIizh/ShWhB66FylSiKrdL5SlRDPiTKtwIlGJhXHuBajc4cXeC n6acUa5qn+xuOaMMsN/llDNaqJFezeb7tiVQHt3YlmFQ7U5n72xCCqduJkNkltgx295ZmMS5M0tg EqfKLLlg8//EXKvnkjhc+fdBKyMNJ2rYskiXWcBo8NYoKowJ2YmFhkb1iYUO9IQniakOYssfRe2h wwOi3BFjN1h/LNwaiB1tcssxWz1NcRJnDogKEkrqA6I+IOoDoj4g6gOiA+0lHxB9RAHRXRKfNOKW ovStcXffGiyfw3xrbXuZdb6lerY4hNZyZ8yWb41QRfOn5FsbkF55N93TddPpBmMIgteNqDqXQpit 95zGZe3SHudOaVEkIduzw1u/IeLtftS0CHnUibabOQAylELuSNQ7YzIZn5Uz8NzkorPNa+82SDRU pVfnxkzNYpSn6+l8vL6x5u3LKDc1GHJrZsQja33Bvt0rE1gJc4J9vLAi41/K9vRYBhu6zi5Gc2w5 h5tfxG6hruE9BYW0S/wX6BN3Cx3V/1ybJeo47CkZJm6R3ZMALR4RgQVq60+zM1VaevQKt4BqxKz6 Bd2yziajVZqNZnXQOHOsGW3Rb9lNy8pI4aRx3QcQDxlTN+zLRUEzVQZo02puo5kF59UMkc0yp9K3 VmSYJXtEApuRaDncQsiE56ISh3YR2snh8e8vZTefkEAsOAzdjiiEJMSK2SD2y6REhnWrR5Cn3WyH UAk3pWehKQnenhsKEPaTwrxQjLVB+WezafVM47LVo9uKgamJO1MjfaZW1PXTN5JgWgxlOS5jGmf5 dlKZgiZIlkWPTTqazjTMMAeVwDO3/pGiiOLSqQwfFjRhBpAMpolOatd+jZLqouQhU0vJs3Jy+C1y qFbNMjeqKRnDCgHbh94IxA4W2s0jLDIagTciU2M1zU2qRwtwIcF3SRd2l4xHdvcZnU7B22FNpbuC ACcgHgqqtSH0H02zdVFYcqe3hMrgHtRumiKMZFRl1m0kxQSsqeJudFEkRVZUpq8Ez4vZxP7H2FeA Wpuj6hG5o59PMhG2HCkgygpyTf3TWa1r8P01PRa4LB0bB0RSoJIATx8szI33A9oq0u5CIuewosN0 j+SJITSswyf/NQu0SjnmIhnHJq4yy+oTW/ULU+guclODj8ChsEs3Gg4FEasiatCN/l1XlTX0d+nG /fUmiBSKZPIwLrChG0TI9piebvSgG54aeGrgqYGnBp4aeGrwg1GDkIYhLKf1dKLmdj8reKM8DzEm 5RTO9jje4/iLDRv0LM4NJQjsGnTB8TQjnJrs4H7CiONz2cLxdsw7OJ7Q86YaPUYc78MGnht4buC5 gecGg3MDbfZzg5y5cwPPNX5MrnGaMITnHZ53XGz8oG+ZfGoMzQ7jHbsXDqY2JcKX7TGF3r2Qed7h OYLnCMARErGXI0TkYjiClvs5AqJvzxE8R/DxA4/pPab3mN5j+r6xBEl7xhJolPFw7xGEe3OChDIi Jk5HEEjWzAnCMX1OkI8leJ7gYwlbkT6W4HmC5wmeJ3ie4HmC5wmn8f3LnkeVLWYXnLqdHciTmHGe 9885wjE9T/C+f4/pPab3mN5jeo/pPab/PpiehVQYmGD7SIIAU+qPJHh68KjoAY64P4zQ92ixBUFi fyWjHXZAB0sNKsdspAZVF3xqkA8jeMrhKYenHJ5y3AvlJZYhfIByUB17zuE5h48jeKLgicIeotD3 7DKRcXZgDaL2X8ODm7EyqkIWh6Q95t2zyyT0RMETBU8UWt/bEwVPFDxR2CxjTxQ8UfBEwRMFTxRa /5wvZtMtUWh1P2EhzTb91AQNIaJAk8j5sDGJZJLld0B7Z2dhq7u6hkA59RC60bG0HFO2AwhC7IH1 JyAK5fTtYIfFRfp2P4l0TmHfzC3irtuMRLDcpZuKOE3HpMgIEtWzq3Y1Z2AQuHLr6BTJ0OiGCquE XY4GY2FM8EvYGU4gqDy/mRXQeynUAN1j7ta3LZJ5Umuf6lkVvDftFueOZGFIpXzqvu2XpXuafZc6 dA84KVAFOeoeJaL4zF3NqzF5s+JaFovYdzW/n/YP0y7pBG2OTtk8fD19PleL1XLv6s9lY/UnQbUJ nFc/k/uah9+T6ls+s2WqUbflrYfgEdWM56Q9JnQyfWCIM61+2sVt+rro8A1VKN0yk3Q0q5YtKljH 3lwRV8hMS+pdtp9cGAv+YaUZib4fxw6UgzZItTYzQwoJvh94zvTPtVkjYcandSOQkeaG17CgarfK gIhKN5XxtNqtnkuiSEJaumYWk/orxvDSjCP7LSJkv/+dTUv/mOW8o3yJjlWAGG5e3xMl+g08ScXC 4pdth9jaGaUixItujuSh25wPRKGHNetWD5GidI2hJkpX5usKPgR6UakEVRQXjl2WH0OPckJT8LTZ T7Fe/HnHsmuytewRDyoD72bZSZiERXJeXFuN2cK1hOnM41qPa1u4Fkj6flYX5lTHjdUfBtUmcF39 xvBkD669v9QFZVYfiQOW5n0CyjGTrYupmoQ4Z45qNYlOONhz9VveSgTdDxylcgeOrIgVAEfoDGuW y4Y0WLCOLamfPt4TRJsM3tfETPL5t3Q1G82+mBxsJbz8yAloC2EKTvGYw+grhOWNmlhkkH8pG7+L CPRQSNzCJ5FmOXim8rFRi1JBwvQgEtOZDvEYAaSHfK4Wb1hP6KNAe8uWG/NTEBftn4a9i4LK7Lna uyJM9OF+nGHsHYzZsnc4ifPZu3p9Rjnpisj3RnsgG/SGpSQpLNcyIwBcErBcI7d+lUJHmDiCGrdy bYARSNwU9yV7bwYV9gOY9FNIJFKUpBplWeWfzyYTFMZhebiZEStOM90Uh9YSvoHb4icq1vCgy1l+ m9qnRZOkMtxMHBJXQsfNVCQx34EvSzPVJXxhEpYJd1twgkSKk3qOzcQKjbF9NwQzsD0/TWRyKJ00 jJgB7fkjyKyY62K83mCD+1OwQ0KrFGwWPPvtuQsuyGKRUaV3vSPihGc1qzEl2bkQncULtDMrn4Lt U7Cffgo2D2mBhdfTbbxRzefjb/CNMeGHJa51CwjPW/HL7bPyxNnXwHkSIf76Op7dpJm2K1Ct0jyD N4d22ukTc66kpLU0cCbiWsbCIcZxXllioo0kTOPGDKJSyyONp7kjRCwSxBDajBGYL+oZ4gpxRPpx mVq/B+knzBnp26ctGKuXCTxxMVugYgmRzDimNcdKZHVSvX1Cu4hhcWA+OHPMaY1y9ERhTmtlrtPl em4wR7gAiW6+wEjRMNmkSIMg8MpEgDIzt68psjiBV1bZ6kp5UEfaIDJBYDv9J0tv13OtEI2g8qXE TbsNNR+Z65YUyrNecrihu89FejwWlzICy3e7+lxnooHPMHZDyERZzQCI+/NorFNr5KoDD9TNwNnJ MNmaDOk1l8TszIUcMpeBQGPEQtEPNFrrnsSxNOcEjdWYCdm5sAfzedDoQaMHjf7c3s/+3F6XoB/k 3N5JagpeMI6/+BOAPIZVApJuFrP1tDqTVz0vfOLcLWZ4aaCXmfufr8fjefB7cCR1bMx8E0n9GECK /NTu9WCllrfVVYt9ZUShRw4jgu5iX0pfEvmSskrw8tvS8mAd0BfiBf0LlS+DhcH8wxf33FDP5LZ1 218D+9dgOVtbOh38LfgZ8ll+vp0s7TUQZvSuOL4j7rfRdP01sGZoacFGwF5ELxh7YQd9lqvFeLb8 +zifK/1T8Owmzze/JV6wF9w+JIktXBbBs/nCLMzYqKWxv/jaZCNV/g57zn/6KfgLDT5dfwj+WJsA phIFNHpJ4pcWgr/6+OkPFNMxx19nk4ma6mA8mhr7nmaz1d9+tijz54mmHXe+evf7p+f2nXwZaaOD +edvy1GuxsHHq2uL3uYvO+4ubzeSkZd3g9HPm5eSQthLz9ZLCCvtfvIDxZYyWmIVjvTMrgyz+GJ0 T8HmznzLBMpjBdPdF6HjQpeC+7+IjYy2WINir3798C6wZF/1lWx2JMvqY5aS3//fpz6Czd0lUdSX jnrFhcnvCLaX6ACCzV3BxfEz3ibmbgRHbPOOD1sUANznxRRyHqxCi0QYsm0o7557Xl+/C+Yw7+kd 9bn7q/ChXwYfP73+ADN8exUm5K/2T8yqsQXDf7267ppiKeNfn17/EbyO376GxwYZ4a9WBg2IZAmJ 8Z9pImJUcHAtuP709o/KVCTxYWO8tf8rx2DlPN/CPHk9Bvzz0WO83jyHiO0YQr6h+BwBubI/5X/h +22s3Lv3f/wGI4aU0sOf4xOO8YaU72rrTXnwvqsP734t58bL578izXcM/3z081//+vZ/q+fnOAZv fcff31zjbxw1hhXyqnp+gWMkdGeMV0c/x6ePV+V3FK8jGIPS8l1dXb8GMf+4ur5+83GbDbb5h/Jv h43xjw9vqjHexOW7kjvPAb9BjnuOej2KN69gjDAW5XME18G7IPjw+z/ffHz/+z97Pge8pZfBh39d 24X8/JdygeGf3lviEXStyn130x53083dbHN3F5bZdzd3vLucZvUA5Dniiz431oazz72cOd9Mq8fe 3Po8OlDIKwsWJ2YSLM1qPQ+m5UzIzs/z6v+HzmyP0I1DejvFXQPYIfTfs6kJPrx9HyzU9MYsO7Gp NXpXQfMHVyT8CNJpMPFuzupb4Q68mxIhw7j77vezxcTi6GBzB959mLE2/8/etTa3jSvZv9I182Ht XVsmqLdq5tbIj2S8iRKP5cxObSrlgkhI5pgiZZKyrfz67QZJmXoCpCLevbc0D8dS0Kcb3QdgowGS uLCY3ZPX6Br/tfYN4kJvpvNK/djbzkq/gdXbOSTNemw1tFq1RqulKZntb7VqtOsNHUnWmbfO4al4 Przpnd5RFQiuP8ONH0SYbb3i/KpSGwt/lNPEEbcmzr1jf8XFMPsGLp84VvLR+IYZF+VlynzvJr5V wg/gZ7yw0BiYTk4vbr7ozeqrhpiLhrAihqhmvU26q4u6zSK6zYK6a4u6q0V0Vwvqri/obqEpthNq Kd+E2FhErO+O2FxEbOyO2FpEbOZFvP4cQ87jx20bVxshfkqXaN9gFDr3Ax6Kr8Y3FWqMhw07kJgF tZMUFN5AT+B9/xqvmKZeuDEvv+/fXtx//vMWjgZTRAL8ee8ET/jbyPUH3JUfTLCHLv2v2fvNsO0s bBsenNEDuOJZ7I7Mallo/CSxhT1Srh/V0PUF6Lo+dF9EEdX8JBsCfyo/RD4MXR7pp8+OTTM4M2xa OjMgznRk0G2dNOFLKC2gcsVRr3t5dwxDnBCowGb53tAZTQNO24bgeEO6RtPvKtPoBiIu+3JzcU1l SFlNDOMKIn3NIxiydBU/4pMOpPWNTsJU5dR1dYtXigVdZq1ptmAwi1CTP4QJXuCsyVRWd1Tp19Rx I2Qx3aBFN1GFFYA7P8J8RO7mdKCBfm2bqqriB/knWIWLi9eeEzncdb5Td7B7P6tid3N9CQ88fICI Jh+c9qPAIXtl+nXkB7YIOsDME8xzm43ENyrPXr1GwqPi5kXvcx9mmFx1aPWlMqXHg0cy+65/AVMv jA2yp4LojN+F9OXMsx4C33O+C1uBRptKFasDlyISVoTGUM5bYThse79/T++x9ANVbebC90LfxShY voschD/fd/8LWtkzWJsEH4Qlu8ORRtE0EJWKUtfNF8rnUgH4DebFwBB7jGFoQe9cA4TtCtITYz+Y YdRqLVwsGK3HM7PeMhlr1R6BP3PHlbE5wvFSf0yE0EG2QJrU20a18Qhpxe5EZqePcgydYAwQwEGO qih0gRweBPGotIXLZzCVk4zce4RwIixn6FjJhIeuhXrDqFYaVVwHjfze9U0fjtzJ379izFmL1U2V ukvi/Sx+psz64cDatapRexsRVexqtV6v1XDFojUorml1cbpZg2m0m2jpwphjjWazabKGnoaeP/Wi bRrqquRezocfGVyj/xEFZ63aBzhq1GL9ZzQdHZ/AZeKnNX+phW+m6Mwwi0CAcaZfHZEqb9JNlrd0 9fqyoyk7F7nwcTjpyL0LhCCy0hWQuxF+y2kxSSHAkUD32aomr/jSLBUG4tmJ98GofsVMVQzjizFd 0tw4KYiHjOOhHcF0glcmBQCO3Cley5hZp3KqssydTq/MrKCAnFzf1Kp00YqNzJ1Px8DOakkB7DX3 tU4lUGRSwUtGe3VSMfDqWNdj+//70cTO9KuF+xhNqqhdTrl7KltTUfXzBHnse0fR+DgDRs9WwdxQ TCYUT0O1NlmlnflGO9UaeoV2KoFitDOba2jHqupr2b8I7cwz/TLzNtrpyeafxEuhXfWNdirhFdqp BIrRDtOZNbRrqDfV/kVoVz3T35/YB+3+GbMdhnj0EMF0AjWyS1XKHjujeMF+b/lh9Gu11TipNVWl 3U9Xdx24FSNc+9KD54njkY9rJhjysePOQJlIxCkPlUWi2UTAxHIwE0nhVIvYi+s08en1Lj5/enf9 ngoEmUMQIjmOpQP0yYfxOK5cwMQPQ4eyaEzA4kPkdBqLqbyhi6PaG0gKR5S7TQLMtIK0LqslF3sk qRRSSWaeBKaVIr1i/8U13OLUhURy7JGAr/iF8Q2OZLFFo85CvrgLuBdOeICLERjEMPFxjA7916jo bb2TJdfzPnx0vEf4+vHThy4ac337R4gkA4YQLWDtY/hP42ReU9U7gbEB/ny/8Bfr4dlOLrncq81X +/X41QaPM72tlQ2g+/Dz1V79/OXcSOFxwjcZTvxgVn8YfK97uR5+F5d8OTf3a/PFenhVZXULZvd/ u/s0ud873+DmHeh80/uyHlRVltxmaHcD3XbhQ7/Lfnzvu3f7jVh3A4dVzo0PLt+40xFQ+f6Gku7+ dDLxgwiejUq7CUfWMXRtPoZzysgVcBNvghZ6N/HWCpVNc+ZQ3kQ/h1pQ1sF0YYo9wFZxtqLKGZOj cWTngvqpp21A/6J/Tfdxxc8Wkd2V65082Z80gDIdDJ5mjiOlr4fA07zM9kXo/UcEL37weAJUFv4J k9FfCU04wdNPFaDmTgQPwp2EJ5TaRSgdCAqzWtnpZa+Lab5kqTT5/Y2KqnOpeVGfNuhWK/tVZjRN +KAq7s/h4vXg++7tHSaJvd4XbTtuZXGfhJnZ6p3TXpkEAEzwOEYOIlziYcfmButwjxLBegccn44x yTM3cgey1YpPLtGvA0rhqbYNAyG8+R7DDuhCng+W6PESoTh6k9DlUE/hLU7IFt8VdNFky5g7BH8l ky1/6trgYYY+ED9Cg3jTIH6ABr7iGC7ReXMHxwyWzU4XewicLvZ2M9ta1kDQ7T3gWonZ1m78W8EV aRT3YDNLfW03d/C1nHvjVWUnz2oQ6NwZfeW/dIDcd2qp1/YAvaveXGoo2nEX8BedygDAze3Vu6u7 i9/fIFLGDZv6xYWV3vJ8vdVNbxZ7qy+10ktd0TW9q2IH/016R6cdO/AwERFY8n2V8d1t88Hq0HkF dwc3qbLhfw03rae4/W9CgvW9Ezl7RzOGnX+64ul0xYtOVzydrgY7TFfDfL2la9CpyN/bQdrbQdHe Wmlvba3equrXyl3BG7neEFvOkKTHtOcnPNgJtOh2Z0PzXNXdxQ0IeS7KCR/QRq2DKjU6gWM0GlWz rXmOhNQM8Ju1+I16vfrWhdbJvFe60B34fQ4bzg8IYm+Osl1LuyHtkDp1rA6E5+uveP9815cLsUd4 mvoRzuQ2/XnfqNSVs/Eltdxy1KbO0oM8QLevGbquv44vI7SyevSGITrF8iezwKHtm6OLY2BtRPIf neC3se9xuxK+DCq2OFZNXP331/DXuz5+Gz2AywNM4eRbl8/kow3Am44HIsClLHrPFoPpSHOXIYX9 g9wHPe7xkRhThX++ilcAOP78IRABKvdzlCsWRG3BbdoALChuDZ8yknCUPJZHFSwehs7Iu59vqtyP yZnvZOGk17/GmWDCB47rRJvqPC4PIxiLMES/Uf1AcDpIU5U7tKqCC6fhh5MUJtu4+s5mk52J5eCy FicuB0MYxxwNpH3DYpj2HjDFHjCHu2DGpTs+mtDDzeONsiHHDO8ZUzPDlHW7S/4s4L99Txkb+SBB W0QPdP70VtAjCgRceSNkqCSY944aADYggQjswME2FfgzeZaBUWkYu9bIkxFM26BUC9M8rr+IN/ds q2J87X6jffOsCvxI9wKYVTiSB+zpuQ8vx/S1lsa5l9JC1O/X73+/7HUVYihgLLl4Pt3Qu6QMW3Ss QQsvHXIgRv5CN3bcLVn2qt79Nhu82l716uWbV831XlVpLO5VVtCrbaVXqy88EEDrbehHfkCz3YWP l0rfjSfvuOwqB4Es3ibP/zArZqNimBUDVey48bcUN81993Vxa8g17WLcpIokbqy9Nm5KjaEVOgYo fKX082n79fU1AesklyKu8D/6hCp9uKyQt/egtZg1Kj2+ossJxlLNu6v2X1CtUBUHHWXUT+QzGeB8 8WvzRN43SFlxQ6WKNAC6Xv6LCVsgrOi0a9FhGHkjY7d3cRHfzd2oG/2rU9boweV1/wNIfXCD/TGg +6mPEVVt4MxVsfJUmeWpqpanqlaeqnp5qhrlqWqWp6pVnqp2iUO4zOmixPmClThhsBJnDLb/KUN1 6Gg5TdC7n31dmtBcmyacv6UJrfVpgkojeYv9yDSB5UwTBgtpQitXmsA2pwnVH5QmsPLSBFZemsDK SxNYeWkCKy9NYOWlCay8NIGVlyaw8tIEVl6awEpME1hJaYL9tlT6Gtr8GzRbrGbKm7frzDylMjQ8 8MCWs3CIhvhBCEc1nNRbdGOwqui5iv8/9LhXujuBbgQEh+6cV222bAKx4ns70v24E6CHrq5+m57Q CpOjdZc3n8EP4N0XVZXj4Jxtm2ioqwNT79HzXzyY0HMW5FMb5NZGbpu7UUSW2fGxPjTvUQ+CJRCD PcUmg188NisgJRH34JwNxB3sSNyMzUWJayYQ1p5ik8EvHpsVkJKIe3DOBuJaOxI3Y3NR4lYTCHtP scngF4/NCkhJxD04ZwNx7R2Jm7G5KHFrCYTYU2wy+MVjswJSEnEPztlAXLEjcTM2FyVuPYEY7ik2 GfzisVkBKYm4B+dsIO5wR+JmbC5K3EYCMdpTbDL4xWOzAlIScQ/O2UDc0Y7EzdhclLjNBOJhT7HJ 4BePzQpIScQ9OGcDcR92JG7G5qLEbSUQzp5ik8EvHpsVkJKIe3DOBuI6OxI3Y3NR4rYTiL/3FJsM fvHYrICURNyDczYQ9+8diZuxufAGRLqJ8bivIntGwQ5V9hWUsvYgDv7ZSN9HUv+ounlojYWFyZpu XLj7CkZGwQ7BWEEpi6wH/2wkq7vrnlnG6ML8TfcvxvuKT0bBDvFZQSmLvwf/bOTveFf+ZowuzN90 G8PbV3wyCnaIzwpKWfw9+Gcjf71d+ZsxujB/090Mf1/xySjYIT4rKGXx9+Cfjfz1d+Vvxugi/KX0 I02hJzI8ptkyqua28FTb7XarrReeJfxi0VkLkoQhudtgH9w9+GYzbyekfMLoh0k/VPcpr1palK1p wvy0p4hk8ItHZAWkHLYefLOerU+k/InY+kRsfdJka8bSomxN0+PgG7Bao9YyWbNhbAtJs5YvJBkF xUOyAlIOXQ/O2cDXgJQHGiWwJfuKsjRNgsN9BSKjoHggVkDKYenBORtYGpLyUJOlGfuKsjRNdaN9 BSKjoHggVkDKYenBORtYGpHySJOlGfuKsjQ9ITbdVyAyCooHYgWkHJYenLOBpVNSPtVkaca+oixN j4M97ysQGQXFA7ECUg5LD87ZwNJnUv6sydKMfUVZmp79etlXIDIKigdiBaQclh6cs4GlL6T8RZOl GfuKsjQ96PW6r0BkFBQPxApIOSw9OGcDS+lxLrby9byr9hVlaXqqa7avQGQUFA/ECkg5LD04ZwNL Z6R8psnSjH2Fq/vpDsH3fUUiq2GHIvYKSkkV/oN/NnL1O2n/rsnVrIGFycoSDM73Fo2sih3CsQpT El8PLtpGWc478qcuabM25mat6q2Fy89lLvJuwfkDjevLD+6TKtLnMrP1z2VWusEKHRPwTx7xe+9Z o3VVuzU2Yh3od++6MOav8OWy1z1j1SpYY5teW/X2DxvUWgZYkbvytWHCYGyP+dJfcMswwAme1P1D G0xNG4z1NhibbWhp25D6waXI2XQ25ajfj3g0DcFAtsWPM4Sq+q3Fmf7sjKV6PeYye4s8pTzDXvb1 fJG97I29xnr2qjQSH2u52FvPw96qHnN4ay1zeGsTc4hpkjmq/qENNU0b1o8gvnEE0XEwXRuqP5C9 tVLYa/7gNw3UK+bXi0X2mju/aYD42MjF3mYe9tY1mYM/1jCnbW1gTruWslfVP7ShoWdDe/0Iam8a QWhDS9uG+g9kb+NHYd18usEf/TMz+yzbr/itUTWqnZu++eGEPgzjD71v8XNuG8YJ/qjJrrMTpnrp QigCx++A0zJq9P7VS5DZ1CpSHpjul782wajMGTuWSDo99qehSF+xYPnjse/JVyxw15XNVEh2BwLu 2AwmIgh9j9PbbLLvyiEsOSSVnXO8yTTC0X8HdwH3Qle+66YvIjCxyWzgYzINPIQzy+VheCZbxz9V 8ybZ1yD8qFF7ZQBg1htNzMHPVK9pWRA0UbBabZi5BWskyMxGbsEWmdps5BEMQ2HKLkK1JdcZuQSp i1A3cRbNKViTgtVaHufELx/h7sjH9cjDOAU6msOoBm5KvYaaeg1NqLoaSvV41RSqpoaqaTgKp0w+ jfwxjxwLh+QscdtAhBHgAsl6DKfjMX0znHoWHbDuwEh4OE1Y9+hQ1SINso3xc6tRrVXoKkT+F5ae fbFFa/XD0RKizjvarOnAsfRf1KV6L59q2lHKNxUAeGmssD/gz4/dT2/vcmeVFpwLD94HQvAAfhnJ Pwe/WdyzBU1t1kMFZ9t/qFIhnIUH01EI3LbRtsGM3jHl4LRYgZ4jr1S/2PxZjH9Dyx94pANJBO0i pWxBZQQK3W33+hJ4EPBZqHqEOgkTH4OpB5WKTmvL90LHRkKgIlkN0JSjHheX+Z5fZJZf5DW/yEt+ kef8ItP8IlF+kTC/SKArQgeZ5ctzbWc4xHHpRfDlC9I08rVqSgmEuTsE2xlisntHJrt3ZLJ7Rx53 hbCC+AWGY1u1WKDW9GbPX4gyOjNa0jjM0zjK03iap/FznsYveRq/5mk8y9P4e57GGG+d1niZ8HDg dyARidXElsWdifsfuyz2chyYOJZa4U/SkGQJE18vfEy+OKUj3KU1AzWRVWRQvWVtGez7FizVq1iW sWZbsFRJxjLW6xYsVba7jPWyBUuV7i5jPW/B0sx351jTLViqCWQZK9qCpVotL2OFW7BU0+AyVrAF S2ddS+sD1/UtOa8yo94wH8/lEkM9xybiUl+yTIIQl9soCRwzw2cB8vWozAB/GtGLVPG32PDwJLNw U7mP0su6fI8z5ta4skWwjmphcnp6CoHdQYUv9FPVPHbXCfgddkImdjTuMEnitSCkPPCfEGZBSHn+ OmHsgpDyOGwyZBaElKcTkzG7IKQ8LJZMGgtCyrM7yay1IKQ8SpFMmwtCyj3tZN5eENJMNxaWHphP 5s1WtUW2pprKO/KUmaYWwpvZE32ztySWmmZvySs1EbaklVoIb1mlsuz5lsBMqjmynSetxtlkByW0 lMjSaTIHx3OvuTD3munUa8LYoddUq4pra3hv5ue9rsh21uqEI8NafaXbOKejdDvndBDeOKczEaWc M/NwTqvxAufMf2gpWeUc++Gcy18ZKCAy0S8mbIt3vpWrKhfJxjvPiuop94LqSa6P1EpW42386Hg/ 5g+etog016XjSbO0+ydgcc/zI/oMGs/TIxjxSoXZ+wATiCMS0dldQAPn1c7Lz5+uVOa+c1wRvxEd fsJR9RO9yY1OUsmKPQ8CRwSYOZPhyXmquKQbPQiYeuhMd0Yt4xRboeqvd33auvNkAXf4plc9IfUj ehYLShFEICz/WQQzoF2/OUpHzglHrj+idIt2o1CWuyqXXXn2XnD/fNfvQI86i/4KfHTf0eswzOAe y9NrvufOlBEKhCAbp940RLD4axiLsR/MOmC2ao8wxCaqPYcSA/34tz8lJ+HYTQJXAbjwx2Mnih34 jMvGOp0w9D1bNXKv/rqrAoYHo4LD6mQeAUiUaIifDsNOzDzay3rzg5xKfBxetIdi84hjI1vkGzHm P2fEqC74miPG3NOIKYC7P9JUSyaNaXSMWoIYPkwjOunx1WRG41tHfpZxkcc/qOiRwAZigLPEEkp9 yS5vGOIU7/IwomMV6HqZIYhXJ1oY/ptFp158TYm3sd1Mv0KF/O3NRQeG3KFDUZiDYBAivCADlXFc CCYWpQSpUUciCDwf6sfLXmksgX6I5zKkyogsOpoEvnWM8fcnE2HrC2MYBJ0BwRiPHY9L7iwJNxNh 8hWpwtahMyJGsGy50GQdlupBvxCHgVVqFfYza2E6IiQ1K0sCzWUmL4idSAtD5J0l4Fc4oz6ePY7D UczzpX6ugfvoeNNXQL+G9Cwys9KomGaF4QizeOD64W+uNeH2MRyNLGveqlYxK1UwDaPJDFYjxyJd XcFDgQ0vxcDhcRvztHp8DD8z6Pdu4G4qgExpAGvQy75ZC85v+3cSRmEjjVOOBMBJUHTk9e7XMxz9 Z4uX9nWS9ILs/2PvapvTRpb1X5na88W518YavSGo3a3Fxk58dnFY42RTJ5VyCUlgrUHCknDs/Prb PZJAIGBGEujccwrXrmNj9dM93Y9GPT2jmTPwyYuLk8Wzx7cQFw2Qu06PTM1ZmyMdizsGhJhIa1/k LPtRa6TCRydztkr8XTnYGGMF1mSaToAZSH27JLCTs5dK+wCm646wmyM7Bi7viAXGKqzDYHEdJOsV yyI7a8hGEswY+fbzoAywk6fEKP2okotHjpUDho/oHoCdPPCousU0RVsC6/LCx2KkcDz7YTbyHuD+ hA5NVzVNbukcmW7vhszQbo/bfcZLae8G3T5aeN3RWpD1SZIM3Vggs792ejwTY4wvg+496Tavu9hs xNAuAYMSyZBb8ETAP9OW2mQdHH5GeoPrexJ/tZpiOq7hn1iHHNt5jXYqqQ78c2Ud3UU71CboUI0r ytpBpA58xd8xfsmXRG5u7/9AjRqlVLwdA6bjSop9pUpCcp3+zWVsmxK3vyNlfYx/rtz+3uX1+6T9 CtOhrMTx41WPXVFJB4BcJO1XmY4WXdNxUbkdg7tOHEe1q6MOSmNfdXpdhPnQ6fWu7sjia/GH+Dcx HR/6V4mOq2bsK2OtHXiFVK0dKR/VqwvUoTXVuB2kR24I6X/86+ru9uNfJduBXmqT/pceEPns15hg 7KdbyLMJj5WbpGkJabqQlhfSvFxmk7RSUDo2M2mAdMbyizKC6YOzjKwiFxamSbMXome6IMgFJItT GACFTjSfwbCZWSKtfZ0l/4patgGUknWMFFTU0n/5nkP617ckML2xE3JzU3jodUj2izESv1SJ+8Bk 0oqciqIEk6aSamhNvvStH0whjyYLCSYt9rB2YGDx9oBew2f8V/VbWntdNp6rH1rbzrUbBlutApKy FltNDEPVDUNQMtteRZFami4iSduLqwt4Knntp3d2705h4HvzkfRhdA3Z1iv0rzy1sfAfrJs4Ma2Z ++DaX6VXiX6DIf7MtZJfl2+C8nrOPgwtnTD0A/IPeLDgPTCfnV32P4n16nlD5FVDaBlDeL3eNt3K qm65jG65pG51VbdSRrdSUre2otsAU9I3m8tGUV9F1KojNlcR9eqIxipisyjizccYchE/07ZhtBHC b+kQ7RsZh+7D0Aydr9I3HmqMBxe2SWIWLmtJQMkS9JS9wiedyWLhhrz8YXB3+fDx8x05GbLXvuD7 A77/JJHxxB+aE/aLTOzRBP8XbP122FYWtkUe3fFjvHCqMjJVs9DwG8N27DF3/MiH1lagNXHogRNX VxkbAn/Ofol8MpqYkZBVLH12bezBqWTj0JkS5EybBd0WSRM+sXdLWLnipNfp3r9jZV4ssOF6Mnc8 j1fOEdcb4TMaf+aZFq+WQ1h8wxQ4yKqJ4aJSjq/TjWg6ih+bszZJ6xvthKncruvqDp4UK7pktSkb BHeFwG0b8N0gYs3mrLrDS7/m7gTfgfsBGdPEDaOwQci9H0E+MjMxcyI6+LUl86qKSYXXKl1cvPHc yDUn7g9sDjTvH7zY9W+6WFJ/jI98gG4/Cly0l6VfJ6zq3yZUPoU8F8/MYL7hefbqNXI8LG5e9j4O yBskV20cffFM6ZnBE5p9P7gkcy+MDbLnDtIZPgvxwzfPegx8z/2xWv7fgBZBftKw2qTL3qABYzDn bVBKSe/DD3x/KH5u8Wozl74X+hMH59InwEHy+X3nf4kB2c56ET0niO99MaYCjaJ54KxOpG8U6X/C fC4VIL+RRTEwhBZDGPBIEgEQWhWkl861qgYMFiTj6VzWDJlSnHo1X0x3wmJzAveL9pTOz1qQPQNN tJak6E8krdidsuz0id1Dp/HcrQsc5VHoEjg8DOK7El/FSl+pi1jWGc4cyx3h+2eswwPXEk2XlIau wDho7Pdu+gNyMpn9/QvEnBq0ye0Lusj7t3hnlc23A22piqQu7wgFmqpomqrCiEXoprjB0cXZdg2y 1GpSTV6556jebDZlqotpYJPvuzRovOSe9Yd/UHID/md7zOjq7+REV2P959gdvTsl3cRPG/4ohC+n 6FSSy0AQ6Vy8OsJU9tNJlmW6etNtC8ouRC59uJ1E5NL1C/gENCdsHhYHkxgCI790YfujmSkMnBc3 ngfD+hWVeTGMH8bxLCVLCuJbxk03aQh5vRDcuXN4llFZkyRN4qV5i+6Vyg0N3yCFznWplqcLR2xo 7qI7JvRcTQpgr4WfdTyBMp2K1JClfKciAXV1Mbb/v7+b6Ll4tfAQdxMvat25OTljV2NR9eMMeOx7 J9H0XQZMliFKYeTMZhhPLmnztJOXtOONoXO04wmUpF1zE+0U/rPsP4R28rl4mXkX7cRki3fitdBO WdKOJ5yjHU+gJO1am2in6tp/Ce2Uc/H5iUPQ7t/R20GIx48Rmc+IinbxStlTdxwP2B8sP4x+UQzt VG3yqtjcDRrECstYFoneZg6ZWe7m3SU2DmIvb9LEp9e7/Hh7ffMeCwSZRRDOKP4SAbr1yXQaVy7I zA9DF7NoSMCS1wElqU15hW5RHJ5Xk8IR5m6zADKtIK3LCsnFHkkqhViSWSSBaaVIrNh/eUPucEHv ReDaY4d8hQ+kb+SEFVsE6izoC7ZZ0Mxkq/qHMczZcrswvSE29b5x97Lb3ztgzM3dnyGQjIAkNQht vSP/I50uaqpiKzC2wF8cFv5yMzyt5JLuQW2+OqzHr7Z4nIpNrWwBPYSfrw7q508XUgoPHb5MoeMn srI3+F6nuxm+iks+XciHtflyMzyvsroDs/OvziFNHvQutri5Ap37vU+bQXllSd4WvPvmA9u2dN+t 79wfNmKdLRzmOTdeuNyfzMcEy/d9TLoXm01JjVaTnFjvSMc2p+QCM3IO3MybgYVeP55awbJpwRzK m4nnUCvK2pAuzKEFVEn3EhDSzOxcUT/3hA1gm0iH82HyeoCbjneKZH/MAMx0cENRsRyHSd+MiJnm Zelu2t/94OmUYFn4J0hGf0E0xw2ef2oQvNyNyKMzmYWnmNpFIB04GGa+srNur5N9gabzvs+j6kJq UdTHCbp8ZV+hUhM3rhSFi8eD7zt395Ak9nqfhO24Y8V9FKay0bvAuTIGQCDBMyFy7NUfaNjCYBHu sX1j28T1cRkTW3PDZiANI165hD8OMYVnb4AMHcdbzDFUQHfY+mCGHg8RyqM3EZ3d6im8ZSKyZVYF XTXZkhYOgR/RZMufT2z2/tXQ2YcGZ6nB2YMGM+cYk6GbzQqOGa6bnQ72ADgd7FUz21rXgNCtA+Ba idlWNf7lcJ00igewmaa+tpsVfM363nhU2S4yGiS47gw/8r+3CbrvzOKP7QnpXfUWUiOnFTcBfhCp DBDSv7u6vrq//LCESBk3aooXF3KtNYu1VjS9WW2tuFSulaKiG1qnQAP/S1qHqx3b5HHmRMSa+NZT 8nbb4mZ1cb3CpIKbeNnwf4abNlPc/i8hwebWOQVbhz2GXby7MtPuyizbXZlpdzWs0F2NirUWn0Fn TvHWDtPWDsu21kpbawu1lle/5s4K9tl4w9mxhiRdpr1Y4UFPiaEYhi4JrqvCfZcdti7KDfHYHaGF KiquwJF0XZFbgutIUA17uXkTvq5pyrIJxumiVaLQbfJhARsuFghCa06yTUubwexgOkWsDhzPFx/x sh0rurhv2fPcj3DjGfz3QW9o3N64i1fuWGqj0XQhD254x9bSCS4SYo8RHFk94fvr5MTyZ2+Bi9M3 J5fvCG0Bkv/kBr9Nfc+0G+H3YcN23vE6rsH7G7Z9AXuTf2IGkMIN8Sl27rIXRbz5dJhs4QCD4OF8 LDjLkML+ie4jPdMzx84UK/yLUTwHwPVJiAdIzXGPbM/3C5QrVkRtx7RxArCkuDV6zu79fmI7I3M+ 4a5RM0N8of5hManygNsikGtWOOkNbqAnmJlDF3eW3wLEtjWYOmEIfsP6Qbx7kcJmaHkFFxNvP+ik INmG0Xc2m2zPLBeGtenWBSzmDu4YxO3DtmDaB8B0DoA5qoIZl+7M8WxsBsn2GyMTMrwXSM0kmdXt uuaLQ/7pe9zYjHzIDW0nesT1p3cOblHgkCtvDAxlBPOu8QICF6BAROzAhWsa5HOyl4HU0KWqNfJd x/mI4i08a6wfpcZUFDjOZ6eX0kLUh5v3H7q9DkcMBKQ1Fy+6G8hTcA172xoa8OhgN2LkrzSj4mzJ ulfF3rfZ4tVW3qvdpVflzV7laSzvVVrSqy2uVxV2vCKOt8kg8gPs7TKn+KRlV3YTsOJtsv+H3JD1 hiQ3JFBRceJvLW6C8+6b4qazMe1q3JiKJG60tTFuXI14XJVEOL7i+vms9fr6moC1k0eRyfE/O5iI DSvY6z1gLWSNXI/ndLnBlKm5vmp9IUoDqzjgKEk7ZXsykIvVj+VT9t4gZsU6TxVqIHhYKjswtesG jhWddSxcDMNeZOz0Li/jt7l1TRpcnVG9R7o3g98J00f60B6JdG4HEFHeBM5CFa1PlVyfKqU+VWp9 qrT6VOn1qWrWp8qoT1Wrxlu4zu6ixv6C1thh0Bp7DHr4LoO36Gg9TRB7n31TmtDcmCZcLNMEY3Oa wNOI3qL7TBNowTRhuJImGIXSBLo9TVD2lCbQ+tIEWl+aQOtLE2h9aQKtL02g9aUJtL40gdaXJtD6 0gRaX5pAa0wTaE1pgr0cKn0NbfMbaRpUldnL2xqVz7AMTR7NwGa9cAiG+EFITlTo1A18MZhX9Mzj /xW4ANkPfHwRkLj45jxvsmUbiBW/25HOx52yHaPzn6YrtJLdhkm3/5H4Abn+xKtyHJ2zaxINdOF+ uU8ebtQ7w30W2K4NbGqjsM2dKELL7HhZH54OIwZBE4jhgWKTwS8fmxxITcQ9OmcLcYcViZuxuSxx 5QTCOlBsMvjlY5MDqYm4R+dsIa5VkbgZm8sSV0kg7APFJoNfPjY5kJqIe3TOFuLaFYmbsbkscdUE wjlQbDL45WOTA6mJuEfnbCGuU5G4GZvLEldLIEYHik0Gv3xsciA1EffonC3EHVUkbsbmssTVE4jx gWKTwS8fmxxITcQ9OmcLcccViZuxuSxxmwnE44Fik8EvH5scSE3EPTpnC3EfKxI3Y3NZ4hoJhHug 2GTwy8cmB1ITcY/O2UJctyJxMzaXJW4rgfj7QLHJ4JePTQ6kJuIenbOFuH9XJG7G5tITEOkkxtOh iuwZBRWq7DmUuuYgjv7ZSt+n9vpBx4IWliZrOnExOVQwMgoqBCOHUhdZj/7ZStZJ1TmzjNGl+ZvO X0wPFZ+MggrxyaHUxd+jf7byd1qVvxmjS/M3ncbwDhWfjIIK8cmh1MXfo3+28teryt+M0aX5m85m +IeKT0ZBhfjkUOri79E/W/nrV+Vvxugy/F0s9QaIGQuPLBuSIu8Kj9JqtYyWWHjW8MtFZyNIEobk bYNDcPfom+28naHyGcVvMn7jvaect7QsW9OE+flAEcngl49IDqQeth59s5mtz6j8Gdn6jGx9FmRr xtKybE3T4+AboaquGjJt6tKukDTVYiHJKCgfkhxIPXQ9OmcLXwNUHgiUwNbsK8vSNAkODxWIjILy gciB1MPSo3O2sDRE5aEgSzP2lWVpmupGhwpERkH5QORA6mHp0TlbWBqh8kiQpRn7yrI0XSE2P1Qg MgrKByIHUg9Lj87ZwtI5Kp8LsjRjX1mWpsvBXg4ViIyC8oHIgdTD0qNztrD0BZW/CLI0Y19ZlqZr v74fKhAZBeUDkQOph6VH52xh6XdU/l2QpRn7yrI0Xej1eqhAZBSUD0QOpB6WHp2zhaW4nYvNPZ43 b19Zlqarut4OFYiMgvKByIHUw9Kjc7aw9A2VvwmyNGNf6ep+OkPw41CRyGqoUMTOodRU4T/6ZytX f6D2H4JczRpYmqzpBIFpHiwaWRUVwpGHqYmvRxftoqxpttl3UdJmbSzMWt6phev7Mpc5W3CxobG2 vnEfU5Huy0w378vMdYMVujKBf83IfPBeBK5WhK+Gi2ibDDr3HTI1X8mnbq9zThWFWFMbj61aftGh akjEiia5jyWZDKf21Fz7g2lJEnGDZ377wAZZ0AZpsw3SdhsMYRtSP0wwcjauTTkZDCIzmodEArbF 2xkShX9qcaY9lbF4x2Ous7fMLuUZ9tKvF6vspUv2SpvZy9OIfFQLsVcrwl5FjDmmsZE5prGNOcg0 xhxe+8AGVdCGzXeQufUOwuVgojYoe2SvWgt75T2fNKA15K+Xq+yVK580gHzUC7G3WYS9miBz4NsG 5rSsLcxpqSl7ee0DG3QxG1qb76DWtjsIbDCEbdD2yF59X1j92z58G5zL2b1sv8KnkiIp7f5A/v0U fxnFv/S+xfvc6tIpfFNZ0+kp5R26EDqB67eJa0gqnr/aJSybyiMVgel8+rINhmfO1LWcpNFTfx46 6RELlj+d+h47YsGcTNhlPCS7TQLTtSmZOUHoeyaeZpM9Kwex2C3JbZzrzeYR3P335D4wvXDCzroZ OBGR4ZK3oQ/JNDFDcm5NzDA8Z1fH33n9JtqnI36kq6+UECJrehNy8HPeMS0rgjIIKoquFBZUUZDK emFBA01t6kUEw9CRWROJYjRbRQWxiUSTpSLOYYIqE1QKtTE+fMScjH0YjzxOU6CTBQzvxk2pp/Op pwtCaXwo3vaqKZTKh1IFHAVdpjmP/KkZuRbckm+J24ZOGBEYIFlP4Xw6xU9Gc8/CBdZtMnY86Cas B3Aob5BGshfD74auqA0DnkLof8cSsy+2aKN+crKGKHJGmzUfupb4QV28c/l43Q5XvskBgEdjg/5J Pv/RuV2e5U4bBrlwPPI+cBwzID+P2b/D3yzTsx3s2qzHBvS2v/JSIeiFh/NxSEzbBtuGb3jGlAvd YoP0XPak+tk2X5zpb2D5oxmJQCJBO0Ap28EyAoburnPTJWYQmG8hbwt1FEY+BnOPNBoiV1u+F7o2 EAIUsWqAoBy2uLzMj+Iib8VFXouLfC8u8lJcZF5cJCouEhYXCURFcCEzOzzXdkcjuC+9iHz6BDSN fKGaUgIhV4eglSFm1Rsyq96QWfWGPFWFsIL4AMOpzRss4NV4sufPSBmRHi25OCxycVTk4nmRi1+K XPy9yMWvRS5+K3LxjyIXQ7xFrobHhAc3fpskIrGa2LK4MXH7Y5fFXo4DE8dSKPxJGpIMYeLnhQ/J l4npiDnBMQNewqrIhHfK2jrYjx1YvKNY1rHedmDxkox1rNcdWLxsdx3r+w4sXrq7jvWyA0sw311g zXdg8TqQdaxoBxZvtLyOFe7A4nWD61jBDiyRcS2ODyYT32L9KpU0XX66YEMMfh+biDN9yTCJhDDc BkliQmb44hB2PCqViD+P8CBV+Ck2PDzNDNx47sP0UmPnOENuDSNbAGvzBiZnZ2cksNug8Dt+510e u+uU+G16iia2Bd4wSeK1IsRd8J8QZkWIu/46YeyKEHc5bHLLrAhxVycm9+yKEHexWNJprAhx1+4k vdaKEHcpRdJtrghx57STfntFSDDdWBl6QD5ZNFsVFtmZanLfyONmmkIIS7Nn4mbvSCwFzd6RVwoi 7EgrhRCWWSW37LlMYGZKgWznWejibLIDEkJKWOk06YPjvlde6XvltOuVydTFY6p5xbUNvJeL815U ZDdrRcKRYa240l2cE1G6m3MiCEvOiXREKefkIpwTuniFc/KvQkrynKN751zxykAJkZl4MWFXvIuN XHm5SDbeRUZUz4UHVM9sfMRXko+3tO94PxUPnrAIM3eCy5Pe0uafEsv0PD/C34nAfnoI47xiYfYh gATiBEVEZhfAwEW1s/vx9opn7rU7ceIT0clPcFf9hCe54UoqVrE3g8B1Asic0fBkPVVc0o0eHTL3 wJmTN7wyTrE5qr5cD3DqzmMF3NFSL79D+nw9aJMeioL2wAdjTl5HYQbjHVsL5nuTN257A8dB/XNv HgJY/DGZOlM/eGsT2VCfyAgu4VXwa3Tb09/+PICBFtwJEW5J440bhFz606kb4aSbE7zAIEzD9Xq+ Z/Pug6sv9wqBOPgekvQ0BsBRXKJEQPxsFLbjOOLM0NIP7Mb0gaw4I2GbkQkX2U4x/sn/Hv7xHp+H C4FScwgUo61pa4jhHg8QFMPnrAotArLPPaE26SXsAEH4RtcuXI/Lvr3IxxfwoijIvr2YGxBv9GIL /tts4F7OBBTD53mxAMhevbhBLxE8ExBkVfmwjhXA5ztWGGTPjs3pjR0L39boqeqbDdzLyX9i+AJe FAXZtxfX9RLBk/9QdssttTfH8vEFHPt/7F1tc9pKsv4rU3u+2PfaWKMXEFSdWweDk7AJCWucs6c2 lXIJJLBikAiCxM6vv90jhCUkeRohdO9uQSXY4Jmne7qfGfW8U0HKNmxGvR+j9PEOPY2c+lPK/X40 fLkVySAlWzEllxHv98O8OdFHaYaV4xMMSwUp27AZIZIYsLB36WlmK1jKLX40fIIVqSBlW3FXLiPe 4qeDdjlVqhTD0vAlht0HpEzDZskNDQtvfCdhTv0p5a4+Gj7BilSQsq2YUclJd/Vh3pwqVZph5fgE w1JByjZsRr2foPTJDj05z1awlBv5aPhyK5JBSrZiSi4j3siHefUjG1aOTzAsFaRsw+7KDQ0Lb7v0 zBlaKOXePRo+wYpUkLKtmDH+Qbp3D/KqOVWqLMMS8OWGJYOUbNiU3NCw8LZDTzWn/pRyux4Nn2BF KkjZVsyo5KTb9TBvTpUqzbByfIJhqSBlGzaj3rso3d2hp6ZkK1jKHXo0fLkVySAlWzEllxHv0MO8 2pENK8cnGJYKUrZhd+WGhoW3XXrm1J9yrskjCiDYkYxStiEz6nn6mjxIqOdUodIMSREgNyQdpWRD pgTnGjKnyvBS7sIjCiAYkoxStiEzqvaM2GbmTSCUZ1uCAIJtyShl2zZjokOIn+2SNG+isJQL74gC CIYko5RtyIyJDdKFd5DXyKtZZdmWIkBuWzpKybZNCQ5tC287JDXyqlEpt9oRBRAMSUYp25AZtZ10 qx3mzatZpdmWIIBgWzJK2bbNaAA8FO8lSMpzJw/KubqOKEBmyH1QSjVkhmBGvLoO8+ZMKZRnW4IA gm3JKGXbNmPqw0fxfoKkWprN8c2miSXOksSjfRKP90ls75PY2SfxZJ/E030SP+yT2N0n8bdkYiDS bvcjlvhxn8SzfRLP90ns7ZPYlybe3cvpv7YvVN8TzHsNTNsTbP4amLon2Ow1ML4n2ONrYMqeYN+I G78pWC5x4zcF64G48ZuCNSVu/KZgTYgbvylYDnHjNwXLJm78pmCNiRu/KVgj4sZvCtZrhxQQeR/b +G2Y/GXjN9HcGRu/9eTGb2O78duQbfzOkiPZ+J2VZbvx2xAbv2UkzNr4bck8kbXxe0TLlNz4PaZl Sm78tmmZkhu/HVqm5MbvCS1TcuP3lJYpufH7gZYpufHbpWVKbvz+RnRukhKPxFxJTsyIuZKkmBNz JVnhEXMlaeHLcoktg07w7I2xFr+c80VoJMTWO9dz5+s5u5+uraXlrRzHvocIfgE/W2BjyPNenOp2 hcrVCIDhUXFz60nAWj8sd4YdCuba8Nb7xEaWZ/90bWh7zkbQ8uBOpbkP3aTVg+UxVZw4uhF5Ltq6 sGx0yVw1H8Uf/Z9gyB/Oklls5a+g/QXzaE2loZgNrc5G0K4+Js5C0830gFtyt5Ve1m6rLFH5u61e 3Ki2FP6y5ASa0X779j38jCXALVDGawmgx8br+Qk0IeLVBPFdVpkJUIT5agIDTLBJEDysV3i26xdV U+pfW+KzsIE48BUZsDHE0hn5/iqJYuzywJsEwISZFazwIFX0Pu4Jdp6g32uTsq69cBdpeHDlLOaJ QJL/dtBpsQnQHTiw8vGZuIInLcPn94wtF2Psc0RKnTnLpecz47wmAX0vfgLIdIoanS2WPtSLYOUv Fo5Nz8xsy8FTX1fOEiq8JXbk7WSOVoqgrVAUpA7cKcYtsUc0JG20lIgdYBfAhtCkptf4b9wUDRFu +KvlZIgUe0xkuxAaBv56CVHT7+wKy3j1OA+m4e7BnXJmwH1wvfUTA7sGOISj1uo1Va1xlZ2NreXM D/6YjReWfc7OpuPxNpVeU2satjcNrkCNAsM6S2fmWIEDCbvOyLXCNOqldn7OfuNs2B+wu7XDUJU6 4/WW0kCWX98O7wSMREfc/QgtH55l7LTEntzfr6BduIpv5s3Oed37NLwEm/xw8XjIxcNzgMeEQkvf h1Z20ZLkDrM7pqq0mLLzYpfxr5oTHb46W4vxn/NisCFGAtYSks6AGUh9uyCwk9KXK2UA811D2I2J HQIXN8QWIwnrCFg8+VzsNS2K7Owgmxtnhsgf/xwWAXbSlJhEXx1k4okzTgHDV7wEYCcNPDlcYx6h vQDX1a2NaaRwPPt+MfHuoX5Cg1bXDUNt1iV5uv0edKFAb0/afIaH598OuwPU8E3baEIsrCgqNGNL Vfy13ZepGGL8NezesW7jTReLjRhGBzA4U0wVIiXxZ97UG6KBw+9Yf/jmjoWvZoMm4w38CGWooZ5v UE8tkoF/PlhGd1sOvQEydPOGi3IwpQ2v8B39t3kprPfx7gNKNDjn9HIMhYwbJbSVrpDytQe9Tqib Fpa/rcRtjH8+uPz9zpu3m/JrQoaW8OOnm75IcZAMALnelF8XMpp8R8b1weUY3rZDP+rdOsrgPLRV u99FmHftfv/mlm1f2z+En2gy3g1uNjJuGqGtzJ1yYArlsHJEfNRvrlGG0dDDcrA+6zE2+PTPm9uP n/5ZsBxopRYb/NUHIl/+T0gw8dtH33aYjJVZuXmB3HybW93mlsUyWbm1PXOHam4KoFyK+KJIxujB WSSvpu6dmW+Kvc16WSeCXEOwOIcOUOCs1gvobApNlJ3X5eYnVbMMUM52MSJQqqb/8j2HDd58ZEvL mzqBNDaFh16bxV+CkfjSFekDU+TW1Cgr5hC5uaKbRkOe+6O/nEMczbY5RG7aw9qBjsXzPVoNn/Ff 9K/RoOpL4aXyobStVLmhs9XcI6dqhFoz09TrpknMGS+vpilNo07JyVvb1HtYanPRT//yzp1Dx7f3 iQ2gdw3R1hO0rzKxYeYPopk4s8YL9961vyhPCv8KXfyFO958fLn7TdZyDqBr6QSBv2S/wYMF68B6 cdkZfKa16mlF1KQivIgislYvT7aWlK0Wka0WlK0nZWtFZGsFZRsJ2SaoEq1ZKOrFehLROByxkUSs H45oJhEb+yL2PoWQW/9Ztg29jQA+RV20r2wauPcjK3C+KF9lqCEeJGyxjVo4n7EBZS+gF+LSLuVS pbkb4vL74W3n/tOftzg+jBc9wfs93niksOnMH1kz8UFl9mSG/4mlz4dtxmGb7MGdPoQzZgcjcz0O DZ8EtmNPpf1HObSRgDbo0EMnHF0VbFj6a/Fh5bPJzFqRtBLhs2tjC84VG7vOnCFnWsLpNiVM+CwG 6sVwxVm/3b0LB/pxgA0nEt3pOpwyZa43wWc0/i5TLZwmRVi8Uw44KEYTg+35Y3iB1oRHvfiptWix aHyjtWGqtOm6uYUnRUKWqjdUk+FyL1yQhbcBsfFiLUZ3ZOHX2p3hrVe/IGKaucEqqDF2J2YpFhZG TqwOdm2qslHFzQjvuPDgYs9zV641c39hcaB4v8l8N+h1cUj9IVwpB83+aumiviL8OhNnqbXEfJkm lsMJ28gse/O0cjwc3Oz0Pw3ZMwRXLTETJMnWt5aPqPbdsMPWXhAqZK8dpDN8F+CXz974Yel77q/k 8H8G2grik9q4xbrizhxQBmPeGuec9d/9whuDwueWbGym43uBP3Pw9MwZcJD9+bb938yEaCdvEH2b EW96EkwFGq3WSyd+dGZOlsFnjOeiDOwPth0MDKDEOBXG+tcEEH4oSD86D1I3obOgmI9XqmGqnOPx kC9TgGdQX4zHTSYwkO0ATYymotUfWTRidyGi00dRhy7C8yVd4KiMQh3g8GgZ1kq8fCm6RGslos5g 4YzdCd44JRo8MC0z6opWq3PoB039fm8wZGezxbffwefcVBqmTFwXef8crpnMrg68qWuK/lIjNCiq Zhi6Dj0WUqXoYe/iMl+CqjQb3FATdY7XG42Gyus0CeKA0NckGLLgXrSHHzjrgf3F6tG6/p6d1fVQ /hU2R+cXrLuxU8YfSfhqhM4VtQgEU67ooyNC5CCaZHkJV3vdFjHvNksHp7Up+aIzVvEJaM3E6ZbY mUQXmLvHq772aBYCl84PN5wHw/Errsp8GD6Mw1lKERSEVcaNrmUNZK0Q1Nw1PMu4aiiKIZ0H2zav XK0ZeGccNK4vYmWysMeG6m6bY8av9M0A2NPezzpZhiKNilJT9XSjAn7gOo3t/+9rE7+ijxYeozbJ vNZdW7NLkRoHVT8tgMe+d7aan8fAVFWFx9vKWSzQn4qsb5KmnfpCO1kfOkU7WYZitNOMLNo16up/ CO3UK/ow82u0o+XdvxGvhHbaC+1kmVO0k2Uo2Npl0g6eBf8htNOu6PMTx6Dd/0VrBy6ePqzYesF0 1Es2lD13p2GH/X7sB6vfNVO70BuyOEB6JSttYBmHRVbPC4ctxm7WfbI5ndhOLwp8+v3Op49vem9x gCC2CMKZhC8K0EefzefhyAVb+EHgYhQNAdhmHbiitLhsoJuKI5sb2AwcYey2WEKktYzGZUn5Qots RgpxSGYbBEYjRbTB/k6P3eKlA9dL15467At8oXxlZ2KwhTDOgrYQ14MvLHGPxyiECZdjtPBfvUab ekdNetsyfMBr5L98+Pi+Dcr0bv8RAMkYBwiT8eY5+y/lYjumSluBkQN/fVz4TjY8P8gk3aPqfHNc i9/kWJzTplZyQI9h55uj2vnztRLBQ4Ovcmj4maqVBt9vd7PhDzHJ52v1uDp3suFlz6dXMNv/ah9T 5WH/OsfMB9B50P+cDSoblnxN0XYO3Q7hw7DNyy99++64HmvncFhm3HDh8mC2njIcvh9g0L29Xl6p NRvsbHzO2rY1Z9cYkUvgFt4CNPQG4dQKDpvuGUN5C3oMlRDWgnBhDSXgWrSJjCRZ6JkQv/bICgw7 wx4L1qPN9gA36u/sE/0JBTDSAecRYxyRuzdhVhSXRfvkf/rLxwuGw8J/g2D0d0Rz3OX3v9UYJndX 7MGZLYILDO1WkHvpoJvlwi67/XZ820n77UBG1W2u7aA+TtClR/Y1rjRU9l42uL+FC/uDb9u3dxAk 9vufyXrcisH9ze6c/jXOlQkABgGeBZ4TG2agYFuFKdzDQNBoMdfHZUxizY2YgTTNcOUS/jrCEF7s ABk5jredYzgA3RHrgwV62EUojt5AdFHVI/ixhchj61DQpMpjZWsQ+BVVHvvrmS12LY2cMiQ4LxKc EiRYKcNYAt1qHGCY0a7aUWcPgKPO3mFqj3clIHTzCLjjjdrjw/iXwnUiLx5BZx7Z2m4cYGvR9oa9 ytY+vUGG687wK/9ni6H5Lsfyvj1j/Zv+NtfEaYZFgF8oIwOMDW5v3tzcdd69QESMmzTogwup0lr7 lZYa3iRLS8+VKiU1a0bpNCjgcUqHiw9b7GHhrNhYbPwMN5tt646Lywdm/05mkkXD/7YkgH/2fy7F 4Z+zZ+mwxbD3b66sqLmyijZXVtRcjQ5orib7lRafQZfO/qUdRaUdFS3tOCqtTSqtbPxaOis4EP0N 55U1JNEy7e0KD37BTM006wpxXdVdZ8AcsS7KDR5AR9JCFR1X4Cj1uqY2ietIUIzY3JyFXzcM7aUI 5sW2VFToFnu3hQ22CwShNGfxokXFEHoImRStl47n03u84lbdLp7R8H3tr/Cqafx5X68Z0ta4iylf WWpj8GghD550ItbSERcJiecW9qwecf86Oxv7i+eli9M3Z51zxpuA5D+6yz/mvmfZteDnqGY757KG a/i2J84jEEfEzKwlhHDivIQrV2wU8dbz0ebgA+gEj9ZT4ixDBPsPNB/rW541deY4wr/txUsAXJ8F YEJ7PQNDeb6/x3BFIqvtWDZOABbMPp58j+VkZ7YzsdYz6Ro1K8AN9ffbSZV7vGyWvREDJ/1hD1qC hTVyZ+4qb5xHHGswd4IA7IbjB+F95ZqYoZUNuFhY/aCRgmAbet/xaLK1GLvQrY2OLhA+d/COcGkb loNpHwHTOQLm5BDMcOjOmi6m1nJzqfHEgpDyB4RmiirG7brWD4f93fekvpn4EIzazuoB15/eOnhE gcNuvCkwVBDMe4MJGCTADCtmL11IU2N/bs4yUGp15dAx8k0NxmlQHAsjLtdP4m0ta9aUL+2vOG8e FwEfcS+AqrEzscAez334eY5fkyRurRQNRL3rvX3X7bcl2SCDsmPibXMDcQquYW+NRyY8OkRFXPmJ Yhw4W7JrVdp+mxyrNtNW7b5YVc22qkxicavyglZtSq2qiXNTsb/Nhit/ia1dx4dHpT8LG+9w2FVU AjF4uzn/Q62p9Zqi1hQQceDE347fiPPuWX6riz5t0m9CxMZvvJnpN6nEYBy4CpPYSmrny+bT09MG rLV5FFkS+4NNcKQPuhView9oC1Gj1OIpWe5yLsS8uWn+xbQajuKAoRTjQpzJwK6TX6sXYt8gRsV1 mSiU8HJtd9ddOuPVZXuMi2HERsZ2v9MJd3PXDWV4c8nrfdbtDd8zIY8NoDwKa38cgkdlEzhbUbw6 UWp1orTqROnViTKqE1WvTlSjOlFmdaKaFVbhKpuLCtsLXmGDwStsMfjxmwzZoqPdMIG2nz0rTGhk hgnXL2GCmR0myCSitXiZYQLfM0wYJcIEc68wgeeHCVpJYQKvLkzg1YUJvLowgVcXJvDqwgReXZjA qwsTeHVhAq8uTODVhQm8wjCBVxQm2C9dpS+BbRW/y4SK/+pVJvuBlHeTyck4exsHD7hvscRh7FQF 26sVqmGHa/jwZGkaRHRz3ehIjojhF3dECqQilp6Mk8PSUYslTv+nKliUpdHVdeMjOSKGX9wRKZCK WHoyTg5LcQQ9ft0EVcGiLI3urrOP5IgYfnFHpEAqYunJODkstVsscb8JVcGiLI0ur3OO5IgYfnFH pEAqYunJODksdVoscaEOVcGiLDU2EJMjOSKGX9wRKZCKWHoyTg5LJy2WuMGJqmBRltY3ENMjOSKG X9wRKZCKWHoyTg5Lpy2WuDKMqmBRljY2EA9HckQMv7gjUiAVsfRknByWPrRY4o46qoJFWWpuINwj OSKGX9wRKZCKWHoyTg5L3RZLXIpIVbAoS5sbiG9HckQMv7gjUiAVsfRknByWfmuxxC2cVAULj+pH MwOPxxq5jgk4YOg6hVLVwP7JPrlcfWyxxN2vZA0LkzWaIJgdyxkxAQc4I4VSFVlP9sklqxA/o5I1 pmFhskbzBPNjOSMm4ABnpFCqIuvJPrlknaP4OZWsMQ0LkzWaLvCO5YyYgAOckUKpiqwn++SS1UPx HpWsMQ0LkzWaNfCP5YyYgAOckUKpiqwn++SS1UfxPpWsMQ2LkHW7wBkgFsIXqmoqmvqaL7Rms2k2 ab7YwS/mikyQjc03a+yPQdSTbfJJukDhC45vKr7JduemNS3K1iju/X4kj8Twi3skBVINW0+2yWbr dxT+Hdn6Hdn6ncjWmKZF2RoFvsuvjOt13VR5o6685pKGvp9LYgKKuyQFUg1dT8bJ4esShS8JIcCO fkVZGkW8wbEcERNQ3BEpkGpYejJODksDFB4QWRrTryhLo1B3dSxHxAQUd0QKpBqWnoyTw9IVCl8R WRrTryhLo5VY62M5IiaguCNSINWw9GScHJauUfiayNKYfkVZGq3E+nEsR8QEFHdECqQalp6Mk8PS Hyj8B5GlMf2KsjRaifXzWI6ICSjuiBRINSw9GSeHpT9R+E8iS2P6FWVptBLr6ViOiAko7ogUSDUs PRknh6V4iIktvZQ2rV9RlkYrsZ6P5YiYgOKOSIFUw9KTcXJY+ozCn4ksjelXeHQ/miH4dSxPxCUc MIidQqlohP9kn1yu/kLpv4hcjStYmKzRBIFlHc0bcREHuCMNUxFfTyZ6jbJWeEQK5YyUlI57s1Z2 V9/uacRFbtTbHuNr7B5XJ0REpxHz7NOIpWYYB67K4Ke1su69H4TUGjk1JOItNmzftdncemKfu/32 Fdc0Np7beFnTy4uPdFNh49Us9bWistHcnls7f7DGisLc5Xd5+UAHlaiDkq2Dkq+DSdYhssMMPWf7 Pz12NhyurNU6YAqwLTzEj2nyu3pj5TkYS3Yp5C57i5zNHWMv/3KdZC9/Ya+SzV6ZROSjvhd7jX3Y q9GYY5mZzLHMPOYg0wRzZOUDHXSiDtk1yMqtQbj2i6qDViJ79UrYq5Z8vr5RU790kuxVDz5fH/lY 34u9jX3YaxCZA28ZzGmOc5jT1CP2ysoHOtRpOjSza1AzrwaBDiZZB6NE9tbLwhp8HMDb8EqNn+D6 Bb5VNEVrDYbq+wv8MAk/9L+Gp7vWlQt400XR+QWXXTUQOEvXbzHXVHS8dbTLRDSVRtoHpv35rzwY mTpzd+xsCj3314ETXSww9udz3xMXC1izmUgmQ7JbbGm5NmcLZxn4noV3uMRviEEsUSWlhXO9xXoF tf+O3S0tL5iJG16GzoqpkOR55EMwzayAXY1nVhBcidThu6zdRP3qiL+q60+cMaYadVxueyW7nCSR UYWMmlbX/pe7q+9OG1f6X0Wn+w95nkAsv/B2dvcsCUnL6ZKwoe32uT09OcYW4AZs1zZ56ae/M5IN mBBLxknvszfbJQE0v5FG8mhGM5JKE5pISPVmacI2VrXVLEMYx0znTSRGu9UpS4hNJJautcoSmpzQ KNVGceWGvZgF4I/MlxlQbQ0je3CzodeUD72mIpQlh5IdKppBmXIoU0FQoDLtVRIs7cRz4JF8TMU2 YXFCwEFybuPVcomfTFe+k3iB3yUz5oOacG5AoDInjWwXhvftpgHzK8xCKH/mqNVP1Ggvf1LbQVS5 mcxZTTxH/Xoq2W10MrUjpW9JAGBqbNC/yKc/e5ebG8xpo01OmU/eRozZEfl1xn9P/nBs32Wo2px5 A7Tt7zJTCLTwZDWLie26ULfJI96s5IFabJChx2eqX137ji3/gJrP7UQFEgdoD4aUy3AZAbvuujfo EzuK7MdYdnA4EuN4jFY+aTRUSjuBH3suDAhgxFcDFOmwxYfT/ChP8lie5KE8yX15krvyJKvyJEl5 krg8SaRKgonM/I5a15tO4bn0E/LxIwzTJFBaU0oh9OoQtDJEWL0hYfWGhNUbElSH8KtDLKtDLKpD 3FaH+FYdwqsOMa8OMasOMa0OwapDuNUhnOoQk+oQdlUIJxIXbC5dmVuPpfHm2V9RuavYHmnhuEzh pEzhVZnCd2UK35cp/FCm8GOZwj/KFIb+VikNBp0PU3SXpCSCjaiZaIxovxCZkLLoGNGXSt2fOgzp YoOw7AJwk2x0HOwFevdYhMd7iOwWwF2wHwVYsquCdrEeC7Bk7sAu1kMBlswv3cW6L8CSOaa7WHcF WIqe6RprVYAlUyC7WEkBlmxdaxcrLsCSqcFdrKgAS2UFCj35xSJwuF6lmtXUb0/5YoBcx6bknF+6 oEFiliAlscGHu2OEX99LNRKsErzoF/4SFY+Pt5ZYZOJDR9Di94yDF+z5CYB1ZUsI9XqdRG4XGN7j q6y4ENcxCbr0GKvYVdgLlvZXjki6NScdMDki6U6JdMTmiKSJ6+kjkyOS5hGnz2yOSJrWmSqNHJE0 yy7VWjkiadJTqjZzRNLsk1Rv54gUzY3cIgF4fmX9SmWSQqdQundW6hMqIWyqHapXu8AFVKx2gQeo iFDgACoiFPh/iggF7p8iQoH3p4hQ4PwpIhT4fooIBa6fIkKB56eIUOD4KSIU+H2KCAVunyJCgden iFDg9CkiFPh8iggFLp8SwsbjkwYPN85FaJTwRL4rFd52RIBCiQkPQKb2kbCL9JxdpGdmkU6WXhQF kSxEtWdO0svPSaokxTOKSndszSjqTIvmAxWmxfOBGkLRfKCGUDQfqCEUzQdqCEXzgRpC0XyghlA0 H6ghFM0HaghF84EaQtF8oIZQNB+oIRTNB2oIRfOBGkLRfKCGUDQfqCBs5gMVAz6bD/Qy84FS4dx8 oP+uxOTpfEBffD4oH/s6gCRUD5cV6eLKsRlFhCJdXDkyo4hQpIsrx2UUEYp0ceWojCJCkS6uHJNR RCjSxZUjMooIRbq4cjxGEaFIF5eLxsjW17Z1cZkowffSQYLvfM1fzuSpLtZeWhcH5RXrASR+eZJl eZJFeZLb8iTfypN45Unm5Ulm5Umm5UlYeRK3PIlTnmRyQHaUKslGiciiO9vRwzJKZFKmsFOmsFum MCtTeFqm8KxM4XmZwl6Zwt/KFL4tU3hRpvCyTGG/TOGg9GwU8NnI569L/rrgr7f89Rt/9fjrnL/O +OuUvzL+6vJXh79O+Kt87O+GCIOicGPZmKpfBFY2qLosAisbVV0UgZUNq94WgSnGVddg314wn8B7 wXyC+QvmE8xeMJ9g+oL5BOwF8wncF8wncF4wn2DygvkERbkv5fMJrDbd5BMointPPoGZzyew1vkE 1s/OJ7B4PoFsEO7LJ5BGnPflE0wOySeQXlm/L59Aup62L59AeqHzvnwC6f26+/IJpNed7ssnkN4+ uS+fQHoZ4L58AundbNnMkaOS3pKVTV45Kul1Rdn8maOS3huTTeE5KukFHpkVkaOS3qSAdhK4B+s9 G/2ry3MVZyFi8aPv4JO/2Ryi6DksPd9brpbkZrayI9tPGHNvCIlD+N2FftE08p5vBTrBBqnURewv WtoPHNa+s70F7icmngsvgysysX333nNBX9UmoK38ABRZEDGSzG2f6HybasryiOtH0TZ1zlRv3/Iv g3sQ/h2LiE2SIAGdDeIxOlpLa7eMJpmALr5V2kADFVjxPVsbKaPmnUbBUuzpCgNQljKkC28B5HHC luTN0qVvuqTPj+3g28OgwzwWgbJGaaSHd4j9Q8mckZXvsmjxiCWFVpew+nwxxn2iPt8tNN3wlccG Pl2Mu2SIpMA9CqAytYdpvIVxxA8eCfzFo7S9EWPIf+WvYgATH5Mlg75+7BK9bd6CBJl0u9hPFNvt t2AVwdzukjixIxReg5CzYLn0EtzhyaI7GEMWHg4T+K5s2ev88weDQD8EPi6hHQsANBxSJgrk9Wnc Ff2I2xA3cuBzfRC5fPubayc2FHJZufGn/2fGn8wCeb0uMP4fdMFnfLomtgvKceY5xF8tJyxSoRmf Emi4B4wYmYI+zT817S41di055sUsQlLQVF1Qo9CF8OTih/g4B9HSTsgbo9F8I5oC0vZd3LC9RzRy eKF3n4ijHEjKmYQwCy3jteEN3x2T2PvBSJt2YObOik29KE6EEicUDBU8JyHBfehgIfswb+mm+Gxi J86cdDRNvHXEWLJnjBjaNpX4pFyVufLHpnMIm++nJYtgRmrw9dEOllmM9ZHLMLIw8jDnsQbcl+rb S7YZ5mZXgxGSVQoM72Hv+j383iqgQwGzqIBZXMDiLKyiAvrWKN1bAFm0ni/Q5CwKCyCLdlEBZNF5 vgDi8xTn5wsAC0qLChjtrqanBeL5KsEzK75Qi2qdr13+Addu/CQLtFJSxRCxCcyaWzCm3jXpTtf7 0xjMioUNAxiGAFooGG5iD16y9WAXka589oAqWuzIX2xpplhC/15MwzBIZ0hcC6MAzKw4CcKQuQ1l YnjIGZ48AQoVLCObK+od4kx22CyuHnx4jGf46G55ia2uBb2d9SQ0AbDBO26YDfoLbXPTC+eBxjME 62kjR3bMaxiDogD98Rs5wTae3C7jmZhU8u3cB/en568eCPRLjA+03mg2dLMeOXqdF8O/SA2Noz/4 +yNSmznOujjUoaGjHdukFLxikDBoxQWzYwYF+2zi2aIMrev06Ij8Qsl4OCJ/g+LESnF1YXW6GiWn 1+MPiNOS1BanR9DeeLIK63Kj7bcTUJ4nmEURBURCfTq4GtdBQnceblgP548xHlwAbsQQlGPYlVAL ctbWtS7Rdn5IffujztSEj2orflLd0WGwAiMHa3NONRgn+CC5BwKzJ/Wl2ksA011BuK2pK4APF8Qa Iw/LOCyexcRn4EOR2Q5yO+1MgXz5aXwIMHs6JKbZR5VEPGXOE2D4iL4AMHsKPK1eY5qhbYCb+lrG aoOC+e5NOPVv4PkE9dY0LUvvNCU0/eGAhFhvX6pMxXFe1+P+CGt40bM6YChpYFBBw3X+bW8oq6LA +DzufyD91kUfm40Y1hlgUKK1dXDD+de0Y7a4ksPPyHB88YGIn05LjccF/BI8dFHPC6ynkfHAryvz 6K/bYbaAh9k+p7wdROvBj3jF/kt/NDK4/PAncrRgClBvx5jzONeErExNia43GpyJuhmi/T1tW8b4 deX2D88u3qbtNzgPI9ePV+dDXqISDwA5Tdtvch4dusPjtHI7xtc90Y9mv4k8KBWy6g37CPOuNxye X5P1z/oL8U6Nx7vRecrjvCVk1d5pB5bQqrUjG4/m+SnysFqmaAcZkgEho6u/z68vr/4+sB0opS4Z fR7CQK7/LgYY/+sSfDoiG5X7qOkB1HRNra+p6QHURklqUc20AVqd2xeHEGYT5yG0hl6amKbNXpPW m4ogp2AwLsFziVmyCokvaqLt/NTT36o12wNKyS5GBqpa038FPiOji0sC3vaMxVLbFCa9Htn+4SMS f0xNOmFyakPPSJGCU1PNbFstOfUlrrEs+AMnKDi12mQ9DO74ivkPbDD3WXiz0c1ktjPn8pTZCHa0 eLzBgmgmfDG/ZoHCjfykTQCBdZ+IDry3TglK3RINJ+222Wy3FSm3RWYYWsdqqlDS7rp0CWGnp5cO 6x+8JXjigysyAs8aDLYHUNEytoL4T65parYTejee+0V70OhXcO9Dz0nfbg60linfEfiqLI6hp3+B uQkfo1VYPxt9VJsYnlZEz1eEHlIRmeJ8jreR560fwls/kLeZ520cwts4kLeV492GqmQHtB/ai808 olUdsZVHbFZHbOcRW2URB1cCct1/tuuCwxLDu8zL+0pmsXczsWP2RfsqQxV4ULBL0mphjD4FJRvQ Y34SsVbX1bobTPub8fXZzdWna4xf4um18HqDx7hqZLYIJvaCv9GJO13g/4qtfx62sw3bIXNvNhdZ IJWRqbkNDe84NnNnUhdUDm3loC116DETK6t8NETBir9JAjJd2IlSrbgF7rmowanmovdNCY6ZLu90 V8XSEIvxfMWjNuz1P4hANK7TYXKMN1uJNCDi+SKUAn/LqiZSfxAWD8qGMciXJ+N1nAtPBZ7SbCFg Zoddki2RdNORKpXccCQY3SOeSWDaiME7IPMgCRerGX8v037n11AqV12d4oHneD8GXmCBp6QSJ1zx NSaZEbjyFngaMJoxCy9OYpCXMOJ4qOiYLIOJx49bnUE/h2KRuEHIBx6sD2208UgTuq/dkq2BjoKF 5zxyTt3U9JJQpGvZTqXF04HvJZ698H5g3UFwv8gG1mjQTwM83MBjfhJ52EpuXta4WLo8QcXQW81U 6rJuP39ImI+Lt2fDqzF5BMuvy9MoJGQfxmfEgbpPIp6RZs9sz48T/vTIzFM74lEvRFgBDW+Ku2L4 lMJnMX746DvzKPC9H7mIxj60BMyuhtMlfX6+KdQEvYEGpQYZvvuBp7uK6Vi2anUW+HGwYLjXZAGP Fvn0tve/pA1G3HPBhozQEYTkS5I8buw0Gbcs8mfD85CsIrad0/8Myegj2rYZAfmDrNdWeXiT6njG tQIIrQoyzHIwzLZmddra7YlutXVKMSVjk65T05uafpvlbTjgScCotDp6E0plC6DHBPqpdcuVwTGe fX0Lj7iXyEbsWTrsRELBws5OSU64BR6HzPGmeKQwV/4gWmI1NaNhmuBWzoLhYDQmtUX47TcYKLQF jZCx6+Nj9iguy9n/9NGOaWjm5gE0oKmGZZkmOIBKz+AAPa368xx0rdOilp57xGmz1WrptKnGgSfl FHGwZI4OV+x/UjIA+fNrg5rme1JrmoL/CWrAo2PST+W050slfD1DxxD8ARBEO1FfbOIsR1nMamO6 D/pdRdo1yRmmoKnQoRVgL3gmCTrUePH8vZc489RM+Tjiz4qSpcJ5RuzOE3FGXBGkuqwbhW2C0/NC 2EjiqfGyqzdimSJaK1qqNyw86RvU7AZIRi1pvrCRpO1HxxYp1uqd0BMzXWp8kPmbT2ZdGcEh+kZr GNpTfaNRy5TaYf+QB42eqK/LvsaDJuu1/spe1HlpXL6+CmHMBX4tWR5tgem6DjNfwkJuPWoyF+61 Bq++GbyyBYsng1dG8LKDVxph+IcMXv1EPSxQNHjVaMvPEv+gwWtsBq+sCk8Gr4zgwMFL9w7e5n+L 5jVO1KNSrzF4/wOa9xS6eDZPyCpMFyYk5X2W3PDkx9DGq48oMOMylpA9BNHLXMDybJRh9wIWvd0w n17AUlA7yfUrOTzZYJLehaIW/MClu+QxZCR0vH0XuTyzlnE2yKzR4fDs6vJi8BYXsbZyfdhU/KgA XQZkuRSrayQM4thD7was4jQNWNO6VBaMUcWRxa/SxU00qMMIjOVIcVFA0NXS9PmYjDUypmRskLGl tp4qhJkuhOOK49qozxZC1WJZZwNyjXs3TiPPnTHyBT7QvpIaX0uULyNyMfIrvUKbn34yETD1zaV+ zYZacsreOwYv3/egMoPrv2IYn0BPaJvQzhH5H+14HTJQy1F6Bv70deHP9sPTSiLpv2qdz19X4ufP SJyqRQ6fAX0NOZ+/qpw/nmoZPMxXOoVJkujGi8EPe/398FVE8vFUf906n+2Hl63oF2D2/tV7zSqP h6fPiLnCcB4NP+4HlS1Pyy7KfunxwC8XfunW9z68bo/1nhnDMuGKRP8RBscwCDRCB2V9JZzW6LRI zTkiPddeklP0XiRwoR9CDf2RiBziSnhJ88sP1c2vHLNsixc1sn3/Spx5PXPsV75yBfhV7/Fqku58 8TLfsIzhyCuAlg5e+6tm43DqwZTYmUmX3Xl/H0S3xwRX+t+AHfsbojEv+v6mQbC4l5A5W4TxMVqF CVBHDLtZzqzeH/a290r23o5kQ3VNtY7TYPz5abDGoFoLr5dVhROuw9ve9QcwEofDj8r1uObxmnRz 9PAU47gcgICBZ2NUFnd5QsPWFZbFDsF975J5yMCn4tuoxV4b3MU0YQzj4nECzpf8oU7HDr8ouku8 AHMGeXYaj9W32yJNEP+coCOx4ZBFoCrhM56Oz/GFq1IFv4X4XHNkDBwbsR27Omy+2o62Fgv8idV2 gtXC5Xt3J+xleLAND/YiPOwn4rE5vt2qJJ7JbtUz9xOgM/ezatWdXR4I3nkVZCetulN1ND5BZll/ vkq9aSZzt1VB5ly1C6e1W8bZJJi1iR8F912CIqw78lUHQobnwzXVlHVEE+APlTULQkbX5xfnH87e bSCykTdtqS97PGmtXa61qtZTvrXqVE9aqUq6p3UGNPC/vHUyG/kf2zr45/739h38YyVbhw+6W17L 2JmWsQ/VMnamZSYVtMy0XGtx8qiz8q2dZK2dHNpaJ2utq9Ra2YK4NDo74l4IK0gWyvYmrFN56DFp G+12U1PM18M70xnPmvPiOdRxHxNLN/V2O8eDYpJWhyomJCETPJ1zL3rTsoxNA9rH6zapQnfJuzVs vM6JhbbUthuWNoJXg7NUqXTE/EDdCebnFfXxpK3vqyDBM7vx902zYUlVcR9LFiRUWTRL18Lz6niC pmIqGPd90Nm6xdMaSM0JwsfIw+hX7eyI0A4gBbde9Mcy8G23Ed9PGi47kubdvB3wc3X4cS0LOwKz ix9+cuLx3VXiLBl+gA/4xZPVTDFmkcH+heIjQ9u3Z2yJi/5rx14CYGOq503ywLPstxcBSA2/qeNR UTKReQGJoRvc1QKE7QdBiVWQHKnLbBdjsAeSO9PvW5Sk5rKpvVpIsxntGM+1uFnHam7w7Btywddj huMBqJLQFnnOzwDxg0CWLI7xDJqIheLcZ4MHyWXrOGIFy56FM9wlxuswtcEVvwNbRJNpu2kAXrvL kjnqTDy/gpFzfwYC5O33L/BrAl9j8YS4kQdlGuRTetCF1mjSqivD6SDFuCGuACnuwcjjrSe1dkP7 0vuKkfVtFvAWN3joBqnxXRN4Osj9EX6sxHEtoxwfGDlTDE+jdEA7jN79H7n6OABHyNJMg/xBaFNs NUFviWndNu2anS41uk1ZiPg5drh9wl3axIlXS3K3sH0S3kdOsiDL2TLBkYLbLGYTLyEL/xbekmXs wfMQO/W7Q2S6HXHY7SO1LVnP9FHnaR/1N32k7+8j5XGc47PdR3RfH7We6SOZ0niO3Sv2kXFvRwxP sdLIOAkiVBRnAcxUwULoPbEQyh9QvpyanmCjN3TwnvWGRrVK8b2nT6piEH3fKGhyNzA/CjiLdBTQ zt5RIOUYO7GnEYmspHKudx4eHlKwbqrFbYn8QSa4UAYmPd9PBrUFk60jk/gTXl605GwuzjufidHA hQ8QlGYd83NEyGn+Y/2Yb1RFi7QpXWYFDgREz/8DeyliTlLvOZjKw3fO9oZnZ+IEgqaljc/rtDkk /cH4PeH8yAjao5He5Rh6VBZSWbOiP4+V/vNYGT+PlfnzWFk/j1Xz57Fq/TxW7Z/H6t/sXW1z4riy /iuq3S+ZexNi2QYMtXtqSUh2U7sknJDsbp2pqZSxDXgCNmObvMyvv92SDTYvlmwHzrmnSO06A1E/ 3Wo9klovlloHrMKHbC4O2F7QAzYY9IAtBt1/kyHaBrQeJsgdoLAtTGhuDRMuVmGCsT1MEGlEb9GP DBNowTBhmAkTjEJhAt0dJmgfFCbQw4UJ9HBhAj1cmEAPFybQw4UJ9HBhAj1cmEAPFybQw4UJ9HBh Aj1gmEAPFCbYq6HS59A2v5CmQXWVvZBfp+oZzgKTiRnYrBUOwRA/CMmJDo26gW9fi+YLN/H/ClyA 7Ac+vphJXDxnQbTQsQvE4m+mJGthp+y4/s1vkz1T8V510u3fET8g14+do3PKOwdvCeL3hxY2sBNF aIbNd9XhVRtyEDSGGO6pIFL45QtiA+RALD06ZwdLh/ye3MIGlmWpGkNYeyqIFH75gtgAORBLj87Z wVKL3wdd2MCyLNViCHtPBZHCL18QGyAHYunROTtYGt97XtjAsizVYwhnTwWRwi9fEBsgB2Lp0Tk7 WOq0SeZWQlkDy7K0HkOM9lQQKfzyBbEBciCWHp2zg6WjNslcgylrYFmWNmKI8Z4KIoVfviA2QA7E 0qNzdrB03CaZe1dlDSzL0mYMMdlTQaTwyxfEBsiBWHp0zg6WTtokc9GvrIFlWWrEEO6eCiKFX74g NkAOxNKjc3aw1EXtriRLUwaWZWkrhvi6p4JI4ZcviA2QA7H06JwdLP2K2r9KsjRlYOlZ/WRl4Hlf M9cpBRWmrjdQDjWxf/TPTq4+o/pn2bn9lIWlyZosEEz3VRgpBRUKYwPlUGQ9+mcnWZn6qSxZUxaW JmuyTjDbV2GkFFQojA2UQ5H16J+dZJ2h+pksWVMWliZrslzg7aswUgoqFMYGyqHIevTPTrJ6qN6T JWvKwtJkTVYN/H0VRkpBhcLYQDkUWY/+2UlWH9X7smRNWViGrMsNzgAxZ2WhqoaiqXllobVaLaMl VxZr+OWKYitI7PN4j/0+iHr0zW6SzlH5nOJDxYfonc1NS8uyNYl7v+2pRFL45UtkA+QwbD36Zjtb v6Hyb8jWb8jWb5JsTVlalq1J4Bt8IVRv6IZKmw0lr0iaerEiSSkoXyQbIIeh69E5O/gaoPJAIgRY s68sS5OIN9xXQaQUlC+IDZDDsPTonB0sxUtX7FCSpSn7yrI0CXWjfRVESkH5gtgAOQxLj87ZwdII lUeSLE3ZV5alyU6sxb4KIqWgfEFsgByGpUfn7GDpApUvJFmasq8sS5OdWC/7KoiUgvIFsQFyGJYe nbODpS+o/EWSpSn7yrI02Yn1uq+CSCkoXxAbIIdh6dE5O1j6ispfJVmasq8sS5OdWG/7KoiUgvIF sQFyGJYenbODpXiIiS283nfTvrIsTXZive+rIFIKyhfEBshhWHp0zg6WvqPyd0mWpuwrPbufrBB8 31dJpDVUmMTeQDnQDP/RPzu5+h21f5fkatrA0mRNFghMc2+lkVZRoTg2YQ7E16OL8ihr8iNSZM5I 2bCxMGtFt+etn21c5o675SHA9fXj6piK5Gxjuv1sY6EbrNBVCfw2I/PJe5FIrUmnhkS0TQadhw6Z mW/ksdvrnFNNI9bMJsrbUDcUgscX4z8VhQxneLix8mZa8AGPMhbaDvhqDr6ywley+IY0fmL/FD1u +68eORkMIjNahEQBlvDD94gmvvU2ZWtlLNH1iuusK3NdY4p19PNFlnV0xTplO+tEGpFHeiHW1Yuw TtvNCtNYssI00qxAtjBWiGwHfD0Hf8VqU8/iG9L42geyTj8I69QPPmu/XlM/X2ZZp1Y+ax951CjE umYR1tVzWIENEGdFy0qxoqUnrBPZDviN3fitFatbRhbfkMavfyDrGh+F1b/tw2NwrqZPOv0M3yqa orX7A/X3U/ww4h96X/gpqA3lFB46yzo9paID/kMncP02cQ1Fx/syu4RFHZtIRWA6j3/vghGZM3Mt J870zF+ETnIAv+XPZr7HDuA3p1OWTIRkt0lgujYlcycIfc/Ea0LSl5AgFqtKwsy53nwRQa194He3 T9klIgMnIiokeR/6EHQSMyTn8f2w53NIAeizc+aOc+Yb5Zyh8KeoHUS7G6g3auhvlBCi1qG76V2c i+4syQiqIKgp4PmigjoKgsrCggaaCuQsIBiGjsqySLR6wygqiFnE6L9VVFBngmqjWUCQ3xFrTsc+ xPOTWQJ0soQRVeiEkg0xJRuSUHUxlOhQzgRKF0PpAii8ZcpaDF1L/pog0bVioroplBfdhn7fv8wA LOw5ibCasxZs5uMFRqLTndcxIqs4hqGoNfpP8ucfndvVRdm0ZpALxyO/Bo5jBuSnMfs9/MUyPdvB dsia1KBp/Ico3oAmc7gYh3gpC5g3fCdd88WFNqxGei7rVn6yzRdn9gsYPzEjGUhkTWcR+baDY2Os GPedmy4xg8B8D0V5ReEB5MHDy5gouyCcWwYf4kZUBsIE/cHCI7WaTGrL90LXhrYYbGWjZEk5NK28 zPfiIu/FRd6Ki7wWF3kpLrIoLhIVFwmLiwSyIrjBl10XbLujEdRuLyKPj8D0yJeaa4kh1OoQtDLE vHpG5tUzMq+eEb86hFcdYlYdYlod4rk6xNfqEG51iEl1iHF1iFF1CKc6hF0dwqoOMawOYVaFsAJ+ Z+PMFg3jMTVeiPoTNu4y4UucOCySOCqSeFEk8UuRxK9FEr8VSfxeJPH3IomhvGVSQ0DnQRfdJrEI V8Mt45nh+ecu417mBcPLUqr42dCpnUwu8MjOh+GPGbk4AMLRPCZh6yBEdDveOtj3HCzRFTrrWO85 WKKRzTrWWw6WaLy5jvWagyUacK5jveRgiUac61iLHCxRA7KOFeVgieax1rHCHCxRM7iOFeRgycws AZY5nfoWa1epUm/qzxdskC9uY2Nxpi+eqCChE6EkMWEY+OIQdosyVYi/iIg/wn/FI7rT1NSJyH04 lqyz269hQO96EYC1ReufZ2dnJLDboPAVn6Lk3F2nxG/TUzSxLfGOVFxeGSHhKysxYTJCwjcIYsZm hIQbuuMqkxES7q+N62xGSLjdMW40MkLC3Wdxq5UREm4GipvNjJBwV0bcbmeEJMONzCQBjPyKjiul RXIHhcJ3SoVjQimEldlzebNzhoCSZueMACURcgaAkgg54z9JhJzhnyRCzuhPEiFn8CeJkDP2k0TI GfpJIuSM/CQRcgZ+kgg54z5JhJxhnyRCzqhPEiFn0CeJkDPmk0TIGfJJIaxGfMLFwtXgYq4VGIl8 k0qcHoiAhJQStuAYx0c8LlIzcZGahEUqmblB4AeipactfZJavE+SFcnvUWSKI9WjyCvN6w9klOb3 B3IIef2BHEJefyCHkNcfyCHk9QdyCHn9gRxCXn8gh5DXH8gh5PUHcgh5/YEcQl5/IIeQ1x/IIeT1 B3IIef2BHEJefyCDsOoPZAL4pD9Qi/QHUokz/YH6Dyklm/0B/fD+oPjaVwmRufxyWV5bXHltRhIh ry2uvDIjiZDXFldel5FEyGuLK6/KSCLktcWV12QkEfLa4sorMpIIeW1x5fUYSYS8trjYaoxofi3d FhdZJfhWeJHgG5vzFyvZbIuVj26L/eINawkRr7jIrLjItLjIc3GRr8VF3OIik+Ii4+Iio+IiTnER u7iIVVxkWGJ3lKzIqhERre6kVw+LNCLDIomtIontIomdIolHRRKPiySeFEnsFkn8tUji5yKJp0US z4ok9ook9gv3Rj7rjTz2nLHnlD2f2fMre7rsOWHPMXuO2NNhT5s9LfYcsqeY++tLhH7ecmPRNVUv D6zoouosD6zoquo0D6zosupzHpjkuuoS7OsH7idwP3A/weQD9xOMP3A/wegD9xM4H7ifwP7A/QTW B+4nGH7gfoK8vS/F9xPUDXW1n0DS3Vv2E+jZ/QT15X6C+qH3E9TZfgIRCbftJxCuOG/bTzAss59A eJX7tv0Ewvm0bfsJhBcdb9tPILx3dtt+AuE1oNv2EwhvZdy2n0B4Sd62/QTCO8uSniMjJbw9Kum8 MlLCa3yS/jMjJbxPJenCM1LCiy2SKCIjJbxhAOMkGB4s39no3t1eyQwWAid89yys+av3SyRHDjPX c2eLGXkaL8zA9CLHsZ8ICefwuw3loijk94vz0LHOMUMytvAXwGbmG4M1X0x3iu/9EteGx80dGZqe /era0F6dDKG18nx89ydwSDQxPaLiC76Jyk+sfeR5k9dMVeOZ/dF/Bee/OAExSeRH0GaDe7SW0lSM ptYgQ2iLn6VeoAEDFjNEXnkZW95R4M+INXGs57kPjaUI6dqdgngYOTPyw8ymP7RJlx1ngcBDKDDX CaCxRm/Eh1rwV5CiiUMWnu0E03dMyVt1gaq/rwf4XqjHXjgarfSK1wb+vB60SQ9FQXvggzEnb6Mw hfGJHcjhe9N3YX4Dx0H9C28RAhj/mswcKOv3NlGbjWfwoCN88+2Abnv+6i8C6NttEkZmgM6rEXLp z2ZuhG9uOsELcKiOh6b4ni2a9rr6+0EjUA6+h1NopxwAA4dYiYT42Shs83LE1wtXfmB9vR/Y7CU6 24xMfHdO+OZc1pHqv4d/oghkf0Wg/QcUwb3jhk6AVQzajzY0buBYqE/4JVYyP5iZEflBqzV+4ArA B56Nr03LGbwGz1vDDSOLgcSayRz6hlm4DIfhb6ckdL87xKAt6E+TZCM3CCPetBIK4QOeRMBe8YS4 1YPeRNX5d0MzsiakpSj8o8VL2Bw7RFPSUvybYiazJhmzziAwPgYCTP0xOYE/i14+XsN6ZD4M6rge MGErAPjCqWfOnAz5Wm2lnlvd9A+rbltU7a5uq/7faCs6/BdLhpNFhGcufG7UG/qXNvvMpNlBDNjp xhCBM4ROIItCjTX93iiEXnJqQsmD77DDxdUT582NUq17nujCc97QBfzF8WkqD6FA/nfeq0DpjlH4 ZB74EDWEkT+fO3ZNWhhqh4MHJ0D7AB29ydqdNeGEKJgtVq884P8YOZ8a9EDSehtii9jL7yFgw2Cv ptfoj/BHDCCwWavtEFi2ghmxU2ZhCDUMKt7P5BzzeP48C8e8jVzL5xa4P1xv8UagXEKsCWqtUVP1 s8BSz1gy/Bc5wb7+F/b5EzkZW9YyOdhQUzEsa1AKgzzwMDQnU8cMHUjYdYauydPQM5V++kR+pGTQ 65O/gM1oFNikMc5ScnE/eECcpsBabO3xVWeoKU6bxSA/45ER57gpIPCJQPri5m5wBh56cfFF6fnk PXQtKKP7Tg9alXlbIM3FHUNV2uykmfQPOUt/1Rrh2T0nC3Yg2adysBwjA2syTSfAE6xIdklgZ8Ne qnwEMF13hN0c2Ry4vCOWGFlYh8HiEUCs6yqL7KwhG3FhcuTbPwdlgJ1NSoySryq5eORYG8DwFf0A YGcTeFTdYpqgrYAb6tLHcqRwPPtpPvKeoH5C89bQ63W11RDIdHs3ZI52e8LGlJ8idT/o9tHC6069 BRGGApEIZFxlf+30RCZyjL8H3QfSbV53MduIUb8EDEoUQ4VRJfszbelN1sjhd6Q3uH4g/KfVlNNx Db+4DpXbeY12aokO/HNlHd1lPvQm6NCNK8ryQZQO/PAnll/8o5Cb24c/UGMdugD5fAyYjiuF+0pX pOQ6/ZtLbpvG899R0j7GP1fOf+/y+tc4/xrToWXK8e6qx1JU0gEgF3H+daajRdd0XFTOx+C+w8tR 7zZQB6XcV51eF2F+6/R6V/dk+bP8A/8kp+O3/lWs46rJfWWs5QNTKNXykfBRv7pAHfWmzvNBeuSG kP7dX1f3t3d/lcwHeqlN+n/3gMhn/+AEY/+6hcEQEbFymzQtIU2X0upSmpaQ1gpKczPjDChnLL4o I5h0nGVkNbWwMI2zvRQ9a0iCXEDAOIORS+hEizmMr5glytrPWfxb1rItoJSsYySgspb+y/cc0r++ JTBMHTuhMDaFTq9D0j+MkfijK8IOk0lraiKKEkyaKrpRb4qlb3FyYsoqHJdg0nKddc9/YRPA3zHD bMzCso3DTMe0JsyfohjBhAHxEybEMOGz/iVZ91r5T5gFcFh7w3UwemsVkFTrPOPEMPSGYUhKpl2m aUqr3pCRpO1l6gLOjg/N7J09uDMYid/ckT6MrCFge4MmWqSWC//BWpoT05q7T679WXlT6BcY3s9d K/64OrdY1Pj2YazqhCGU9I/QN2E1WszPLvuPch3DpiFq1hBaxhBRw7lLt5bVrZbRrZbUrWd1a2V0 ayV11zO6DTAlOYe7bCk2soj16ojNLGKjOqKRRWwWRby545DL8jNtGwYsIXxKRnlfyDh0n4Zm6HxW vohQOR4kbJPYLFxyjkHJCvSUHYCrnKlyxQ2h/dPg/vLp7s97XI7Dw1fh+YSnkCpkPPWH5pR9UIk9 muL/krnfDdtKw7bIxB1P+KaGyshUT0PDJ4bt2GPhEFQMXc9A1+WhBw6fWWVsCPwF+xD5ZDQ1Iymr WATu2tiCU8XG0TclyJk2K3RbJtLgs9hsxuOk1+k+8HVVnKfDvR7ueMF3tRDX42sQ8G+RaXwnC8Li +czAQTY9GS6XbfBQ2xFNJgLG5rxNkimSdsxUoed6fa7oFfF0At1GCKMDMvGj+XQxZp9Frd/VPaTK mKtSPGcbr0HAewrwME9izRdsjkkUBC7cKR5mi2HM1A2jEPzFgzi2xnJKZv7QZaeCjqGc53ySuEbI A1t7npsY45EGFJ/RFM2B9v2pa70zTe049BJIxHPZVqXJ0xvPjVxz6n5H28FxP4qI1b/pxisjLMBz vChwMZcsvDxhbmmz/Raa2mzEXhcV+9Vb5Hg4eXvZuxuQd4j82mxXgEDsYXBJLLB9GLANVubYdL0w YrVHFJ6aAVsuQoQFyLCs2AsHayl8F+KX7541CXzP/Z5d0diCFkHYVbPapMtO/ARLcDRQo1Qlvd++ 47mrvDsWzVpd+l7oTx18dWIKVYv8+Wvnf4kBQdyuxYZE0OKC5HMUva/iNJG2ZMnMhPoQLQInvUV9 h0j/EWPbRID8QpZzq2xdkKp4TLEECK0K0ku2FOiGUm8ZyvO5WjdUSg39ObX75ERtKOpzsg3BgpEE sLLeUhuQKpkAPSWUas1n1hic8i0KLlQJEWMvY9rxBbup+R4vvUYsAg/njuWO8LBf1viDa0m9oWi1 hgrDyrHfu+kPyMl0/vVnIAo1qNoQqetiNXvnd6Jsr320pWuKvqqAGmRVq9d1HQaAUnXwBkdaZ7s1 qEqrSetqporTRrPZVGlDTgPbY5KnoS4a6LCG/Q9KbsD/7HaYhv47OWnoXP85toCfTkk39tOWP0rh qwk6rl2XgCDKufxkE1PZT9asVqH7TbctKbsUucQdVTJyGAWYU7YxAgfUeL/4qxtZkzhMeeyzuiIV qTCdgfPi8nVGnBEEOkvFJtg9T3mMxGuNm9z4EIoaomVDS9VaHRoObGZXQCJpQfZ5jCTMPw5sUWLZ vBN6rsdTjW+i8eZGrysSKNPeKDVN2WxvFFrXNbmK8B9f0ei5/LzsPiqaqNS6C3N6xlLj9PXdHDjn eyfR7FMKTFVV6PkiZ86iR0U0hNsXedUVeUUTFhvkFQmUI6/a2kZe3aj/l5BXPZdfFsgjr5xs8V7i /xF5tRV5RSZskFckULLlpVtb3rr+X0Je7Vx+VWof5P03tLwXUMTjSUQW83hiQpDec6IntmtwbuLN PRSUMR8LxN78gN/fgJMy4LBpQrihA2NbttExXMzY3vTRwmM7Hdtk7HhO4FpPYSiqPPCTSgyfjYZq 1HBTUo/tv5exjtuzVTs5WcMTkUl4S4nc4gdO3UXvc4fMLXfbFSs75jIub5JotNe7vLu9vvkVJ7FS e32cEf+RAbr1yWzGZ9fI3A9DF0c3EBXH+2cVpU1FizGyOKL1q3hyEwPqeQDBciA5KcDlTuLtqSEZ KGRAyUAjA2Gvy0W5M+OJcJxxXAb1yUSo3FrW5Q25x1cRLgLXHjvkM3yhfCEnbC5RPI3I3MhupJqb 7DCPIYc5W90l16jJbU7ZerXd7e8dMObm/p8h8JNQhCC09Yn8j3K6XDKQ26O0A/5iv/CX2+FpJZd0 92rz1X49frXD41Ru5XAH6D78fLVXPz9eKAk89FcqhU6SqNqHwfc63e3wVVzyeKHu1+bL7fCiGf0c zM6/Ovs0edC72OHmCnTu9x63g4qmp0X3IX80H9hdtB+d+87Dfkuss4PDIufyjf59XBzDRaA+DlCW l6QptVaTnFifSMc2Z+QCRy8CuLk3Bwu9Pl85xJnwguGXN5cPvzLKknejqJa8xi6lmdmZUb/4P/au vrltG+l/FUzuH2fGlknqxZKmvakTJ22m59YXJ70+0+lkKBKSaImkQlJ+yac/LCjKpEgAC8jiXe/R XUex5d3fAtjlErtYABG6AfxG73Q92ex8CYrYUGfiyBsAMx24bRY3x+HcH6bELaZ0xdXmD3GyOCWQ 6X/F5rHfAxoNkq+vOgTIg4zM6XKVnsKsMGPcCQU1q4WdXV1flvciXf54ozLVLdd2nQbWn+uLNV3b uoDbUbFweejw4+XHT2ySeH39Gd2Oj3y9ZrPX9/oNrONyAMImeC6sysIuKtaxbYNVa4csfB+T+Yqy mIrvCs732sAupgmlsC6eZiz4Uj/UG9vh9xOPSRBDzSCvTuNr9cNhXiYIP04gkHiWUKxA7YVPeTk+ x89DlX3wLwCfe45CgOcCtufuD1tttmdth4X9CM324vXS53vjJvRlZNBnGfRFZLi14XE5vnux1/BM dptehJ8Mugg/9226tysDwEcHQfY2Tff2tcYaMi30eZB228WY+xd7jDl37XnQOtYJNglUbcJX8cOY wBCeeeqsAyHX7663XFM6yrvAfsDkLAi5+fju/btPb396higsb3qBT3vUeuvq9RY7e6r2Fs9V6yWW taF3XdbB//HeqebIf9nesf/8/13dsf+oZu/gQff1vYxbeBnX1Mu4hZeZ7OFlpnq9hZfHGdXv7aTo 7cS0t17RWx/VW1VCXLk6e8OjECopFir2JmxLeexTMuwOhwMLWa8Ht5lTXjUXpHPWxiYhfafnDIcV GTYUaY1sZEESCIHDJhvRB/1+97kDw9Ntn7DQY/LTFjbd1sSyvpyUO7bpBG8GF4lpdEKjGB8E8+N3 ruDgqK/rOIMjqOHfL4NOX+mKr4BSUlDVt4tyLTh+jRdoIkvBeOwDwdYCTmsgJ168ekoCWP06efua 2COGFC+C5Icwjly/kz5MOj59ray7+fEDP7eCn3OydBM27eKnhpwHfHdVtA4nmwMyWFw8Wc+QaxYF 7D9h+Mi1G7kzGkLSfxvYKwBcKPX8kj3yKvtyEoCcwF/O4OQj1ZAFMUmZGuA6+4R1INbIglRYfer6 sAZryO5Nv5Y4yYlPp+56qaxmdFM41+LLdq3mCxwaQ97zfMz17QfmSlZuXucsAOIHgYQ0TeHwloSu 8mOMu3yRXJXHyTNY7mw1g11ivA1Tl4Xi92wuYqm83TRmUbtPszn4TDi/gpJ30YwNIO9/9B7+TNif gTwjfhIwmg75bXPQhdUZ2PtmhjdGCuuGkAFC7sGo4m1fasOO9cfln7CyXhbBfoUNHk6XnPBdE3A6 yMNr+BolcTtGFTnMcqawPA2jw7zDzU//R379/IEFQn2r1yU/EHuQbzWBaIla46E97o3Gdnc8UC0R i8TB9gk/dImXrkNyv3QjsnpIvGxJwlmYgaXANovZJMjIMlqwX0mYBux5SL2ze5MxLa847OoItyVL oKNRXUdXzzpymnWEtuOKnLKO7CYdXQh0pHIaInEH1FH3wU0oHP9kkdssTsBRvI3Zmype5n4vT4Ty B5SnUzcn2Dgdh0XPTseyrb3W9+pPKnIRvckKBjwMrFoBF7GxAnvUaAVKiamXBhZRjJVynM9Gj4+P G7Dxxou7ivFnYwKJMjal5/vJWGvZlG2kGvGarCAJuZj370a/k24HEh9soKz+KT9HhLypfu2c8o2q MCMdKNOsTAJhQ8//z+ZLCfWys0sPSnn4ztnL67dv8xMIBn3r9t2ZPbgmVx9ufyZcHrlh/bHI5S+3 TKOqJZWtKLs9UU57orrtieq1J6rfnqhBe6Iu2hM1bE/UqMVHuE130aK/sFt0GHaLHsM+vMtQlQHt ThNwByg0TRMuGqcJb56nCcPmaYJKIoyW/ZLTBFtzmjCpTBOGWtMEWzxN6L7QNMFub5pgtzdNsNub JtjtTRPs9qYJdnvTBLu9aYLd3jTBbm+aYLc3TbBbnCbYLU0T/OdQ6Y/Ud/8kF0O75/AN+X3bOYMs MJm7ic+9cMoaEicpOekxpz6E3deqfGEd/19JwCBvkhg2ZpIAzllQLXSIQLx8Z0qxFnbKT5+vf1vU TG1q1cnVza8kTsj7z5fHwTEfHLj0Jr8OU7uBl1kGzfDzqjq4OQIHYW8gJgdSRAnfXBE1kJas9Dg4 Aiud5Ne+ajfQ1EqdDYR3IEWU8M0VUQNpyUqPgyOwUi+/3li7gaZW2t1A+AdSRAnfXBE1kJas9Dg4 AivdXOOt3UBTK+1tIOiBFFHCN1dEDaQlKz0OjsBKaX5dvXYDTa20v4GYHkgRJXxzRdRAWrLS4+AI rHQ6JpVbHbENNLXSwQZidiBFlPDNFVEDaclKj4MjsNLZmFSuEcU20NRKLzYQ8wMpooRvrogaSEtW ehwcgZXOx6Ryby22gaZWOtxABAdSRAnfXBE1kJas9Dg4AisNxqRyUTK2gaZWOtpA3B1IESV8c0XU QFqy0uPgCKz0DqTfIa201EDjrH6xMrA4VOa6JGCP1HUNpa3E/nF8hLa6APELbG6/1EJjYy0WCJaH UkZJwB7KqKG0ZazH8REaKxe/xBprqYXGxlqsE4SHUkZJwB7KqKG0ZazH8REaawjiQ6yxllpobKzF ckF0KGWUBOyhjBpKW8Z6HB+hsUYgPsIaa6mFxsZarBrEh1JGScAeyqihtGWsx/ERGmsM4mOssZZa aGKs2wJnBrHiunCcodV1ZLrojkaj4Qinix18M1U0gmzGfFNjfwhDPY6N2EhXIHxlw4cDH6o9m/WW mlprMe/9eiCNlPDNNVIDacdaj2PTbK1fQfhXsNavYK1fkdZaaqmptRYT3+RPYvcGvaFjXwwsmUou enoqKQkwV0kNpB1zPQ6OwF4TEJ4gpgA77TO10mLGmx5KESUB5oqogbRjpcfBEVgpXLrip0grLbXP 1EqLqW52KEWUBJgrogbSjpUeB0dgpRkIz5BWWmqfqZUWlVjrQymiJMBcETWQdqz0ODgCK12D8DXS SkvtM7XSohLr/lCKKAkwV0QNpB0rPQ6OwErvQfg90kpL7TO10qIS6+FQiigJMFdEDaQdKz0OjsBK H0D4A9JKS+0ztdKiEuvxUIooCTBXRA2kHSs9Do7ASuEQE195vW+9faZWWlRiPR1KESUB5oqogbRj pcfBEVjpEwh/QlppqX3G2f1iheDboTRRlrBHEruG0lKG/zg+Qlv9BtK/IW213EBjYy0WCFz3YNoo i9hDHXWYluz1OEQyk3XzI1IwZ6TU2qhttarb83bPNja54257CHB/97g6LqI429huPttYOQxeGjiE /etm7pfoHkHdRVMzIntMbi8/XZLQfSSfr64vz+1ul3ihT6zHSW9oETi+GH60LDIJ4XBj69H12C9w lLGy7QzfkeBbz/hWFX+Ixi/av4QR9+OHiJzc3mZutk6JxawkP3yPdNW33pbaujeW6nrFXaszua6x ZHX2H2+qVmc/W53VbHUqiWBHPS2r6+tYXVdsFe5waxXusGwVYC3cKlRtZ/g9Cf6zVbu9Kv4Qjd99 QavrtWJ1zguftd/vOH+8rVqds/dZ+2BHAy2ru9Cxur7EKsAB5VYx8kpWMeoVVqdqO8MfiPFHz1Y9 Glbxh2j8/gta3eClsG5+uWEft+dO+aTTP9i3Vtfqjm9unZ9P4Zdp/sv1n/kpqAPrlH30eNftU1t1 wH9KkyAek2Bo9eC+zCvCZx11JB2Yy8+/i2BUzQkDj246HcbrlBYH8HtxGMYRP4DfXS45mQrJH5PE DXybrGiSxpEL14SULyEBLP4oKTsXRKt1xp7aT/nd7Ut+icgtzYjDSJ4mMZt0Ejcl55v7Yc9XjIKh h+d8OM752FjnHCX/VPlBaPcA5GaD3qNNCHHgypzrN+e7d5ZcyBgdxti12PRYl7EHjM6gr804hKZ2 raEGY5pSh3eRdPtaEjkjdJH0raE2Y48zOo7O4OR3xLrLWczm8/OwADrZwuw+0Ls4hUkO1CY5QEL1 1VC7h3KKoHpqqJ4CCm6Z8taTwBNfE7TLorpWbPfZ1ObfvQ19F+DjzdsKwNpfkQwec+7BwhguMNo9 3VmFkXn6GEPL6dj/JL/94/KX54uy7c6QvKER+TGh1E3IdzP+7+QHz418Cn7Im3eYa/y7AvuSuczJ epbCpSyseZMncuXeB8yHdch1wF8r3/nuPQ1/YI2fuxkGEqzmcp3FPoXYGB6Mj5cfroibJO5Tquor MN+yPkRwGZPNLwjPW8Z+2ThRDITL5CfriHQ6GGovjtLAZ76YtZVHyUg+aJo5zzd9lid9lkd9lgd9 lnt9lrU+S6bPkuqzJFgWKPDl1wX7wXTKnu4oI58/M0vP4sZciwDC2R/C3htitX9HVvt3ZLV/R+L9 IaL9IcL9IZb7Qyz2h7jbHyLYH2K+P8Rsf4jp/hB0fwh/fwhvf4jJ/hDuvhBekt/ZGPq7YXwTNVyI +h04d8z0ZUOc6hBnOsRrHeJ7HeIHHeJHHeInHeJvOsRM3xhqNqGL2Ct6TDYsuZi8ZXln8v7nQ5aP cq6YXJco9fPQaVwkF/KZXczCHzcLIACCaB5I+DoI2b0dTwX2TYK1e4WOCutJgqWKbHaxHiVYqnhz F+tBgqUKOHex7iVYqohzF2stwVI5kF2sTIK1m8dSYaUSLJUb3MVKJFi7mSUBlrtcxh73q7bVv+gt 3vAgX+1jN+xc3iZRQVKaASdxWRh4Twm/Rdm2SLzOSDyFnzYR3WkpdaIaPogl+/z2axbQB1HGwMYK FnJ2dkYSf8wEPsCnijwfrlMSj+1TaOK4YY9UM5NdYaptWWlmcipMtR0EzUzdClOtoLuZqVdhqtXX NjP1K0y1csdmpkGFqVZ91sx0UWGqFQM1Mw0rTLWqjGamUYUJOd2oJAlY5KcbV6JZpEFhbU+pdkyI Qnhu9grfbEkIiGy2JAJEIkgCQCSCJP5DIkjCPySCJPpDIkiCPySCJPZDIkhCPySCJPJDIkgCPySC JO5DIkjCPiSCJOpDIkiCPiSCJOZDIkhCPhTCc8SnmiWUgotVVyMS+YoiLgcijAMlhC84buZH+bzI qcyLnGJa5JAwSJI4US09NbyTHP13EpZF/kbBqKP0RsELlb0PMELl7wMcgux9gEOQvQ9wCLL3AQ5B 9j7AIcjeBzgE2fsAhyB7H+AQZO8DHILsfYBDkL0PcAiy9wEOQfY+wCHI3gc4BNn7AIPw/D7ATOCL 94Gj8z5AEVfeB87fUULq7wP7xd8H+mtfBiwr/HKZzBfvvTaDRJD54r1XZpAIMl+897oMEkHmi/de lUEiyHzx3msySASZL957RQaJIPPFe6/HIBFkvlhvNUaVXyv7Yp1Vgq/aiwRfec5fLaTui62X9sWx vmM1YIn0WUJ9lqU+y0Kf5U6fJdBnmeuzzPRZpvosVJ/F12fx9FkmBtVRWJZnJ6Ja3SmvHuo4kYkO sadD7OsQUx3iqQ7xTId4rkMc6BDf6RAvdIiXOsShDnGkQxxrv41i/jaK+GfIP5f8c8E/7/hnwD/n /HPGP6f8k/JPn396/HPCP9W2v7tEGMuWG3XXVCMZmO6iaigD011VXcrAdJdVFzIw5LrqFuzuBesJ ghesJ5i/YD3B7AXrCaYvWE9AX7CewH/BegLvBesJJi9YTyCrfdGvJ+gPned6AuRwN9QT9Kr1BP1t PUG/7XqCPq8nUBlhUz2BcsW5qZ5gYlJPULvKvZmpWk+gzKc11RPULjpuZqrWE9TunW1mqtYT1K4B bWaq1hPUbmVsZqrWE9QuyWtmqtYT1O4sEyi3ahK126MEXFWbqF3jI+CqGkXtPhUBV9UqahdbCLiq ZlG7YaBpnsTCg+2ejatff3mHCRYSmj5FHjz5z/tLkJFDGERBuA7Jl9naTdwoo9T/Qki6Yv+OmV4s i/z85jyl3jl0CNOWfANY6D5yWPfeDZaw75cEPvv48CuZuJH/EPjMX51MmLeKYtj7k1CSzd2IOLDB txD5mvvHvG94ybYzXPA/xg9s8O9pQlySxRnz2Wx4uiPrwhpedAdkwnzxArWBhjVgHQLy8yiD550m cUi8OfUWq5g5SxXS+2DJ2NOMhuRV6NuvxuSKH2cBwBOmsIAmzFnDaGwOtci3IGVzStaRT5PlE1Dm Xl0h6vf3t7AvNOIbjqbPctVrA7+9vx2Ta2Bl0pOYNebkcZqWMF7zAzniaPmk7G9CKchfR+uUgeVf k5AyXT+NiXMxWLARpMqdby0O2+IuXifs3e6TNHMTGLwOIW/jMAwy2LlJk3tmQ304NCWOfFXa693v n7qE6SGOIIV2mgPAxGEjBMF+Nk3HuR5he+HzOPB3fZz4fBOd72Yu7J1T7pyrDqTzn7E/1QzkcCro /tepoPefUUHlnTAYO007whnV9p3CHvZqxwQ8z56w5iox7OBR0yxeraiPIV9HzbkHATl9hOH8whR2 fwI8r3VERAYiIj0RoYGIUE/E0kDEUk/EwkDEQk/EnYGIOz0RgYGIQE/E3EDEXE/EzEDETE/E1EDE VE8ENRBB9UT4BiJ8PRGegQhPT8TEQMRET4RrIMKtihiOHQu9eCEnnugQezrEvg4x1SGe6hDPdIjn OsSBDvGdktg4qYvAQid1EVjopC4CC53URWChk7oILHRSF4GFTuoisNBJXQQWOqmLwEIndcVYok1i yOFWJ3V1Nok1yFEldRtYZJvEmsiVSV0hkyypK2SSJXWFTLKkrpBJltQVMsmSukImWVJXyCRL6gqZ ZEldIZMsqSt4f2BSkgJW85SkALCFlKRUsnFK0rbHdk8zQBfxIAN0icimAF1ELojYROSyiE0pIjAQ EeiJmBuImOuJmBmImOmJmBqImOqJoAYiqJ4I30CEryfCMxDh6YmYGIiY6IlwDUS4uyKcITZiUxDX OiwjrilARlwzCBlxzUBlxLUHRkZce4BlxDWHIiOuOTgZcc3h1ohNIzYMFjZiw2BhIzYMFjZiw2Bh IzYMFjZiw2BhIzYMFjZiw2BhIzYMFjZik2BhIjYJ+4tGbE1yFBFbE4skYmskV0VsYiZJxCZmkkRs YiZJxCZmkkRsYiZJxCZmkkRsYiZJxCZmkkRsYiZJxGb3xn1n1+ZokNIEVv95aDGN15HPYgT4Etb/ 4yR0M/Kq2xm8ym08zVjIBCc619dSEfB5VFRbP9UD2UgmKxYjhunWRbC/nZI0+EbJ0B4xwyvIpkGS ZnmIxQSc8kPS+emz7FmO2JPq9PLvJm7mzcnIsvJfvXzx2Z1R0rXKXPk3ek3mIRh0nUOAz4gjsoxn 5IT9+fUOVleO9ZmPYdKHrUpzvjkJzsKN3JCW18Xt0bi32yhhtTkQ70oVVpsriEMd4kiHON4l7mvW MTaxSOoYG8mVDlTIJHOgQiaZAxUyyRyokEnmQIVMMgcqZJI5UCGTzIEKmWQOVMgkc6Bi5VZNIlZx HS31aKl/DUttrLiNjvZ9tO//z/atrig/PhXHp+L4VGCeisZ9Fsvjs3R8lo7PUjvPUuOepYWKa7PY P3dXVLraL+I1Xu0XAR5gtR86hxdtvNzvWGNbkfbZJzOHgFdn5hAg/22ZOUST0Zk5BBYuM+f0xpas tIM/T7u1HRKmpuIMEblg94SIXLZ7QikiMhAR6YkIDUSEeiKWBiKWeiIWBiIWeiLuDETc6YkIDEQE eiLmBiLmeiJmBiJmeiKmBiKmeiKogQiqJ8I3EOHrifAMRHh6IiYGIiZ6IlwDEe6uiC66FkdBXOuw jLimABlxzSBkxDUDlRHXHhgZce0BlhHXHIqMuObgZMQ1hysjrr0AZMS1F5KMuPaClBHXXtgy4toE okZsfOgSBgx96BIGDH3oEgYMfegSBgx96BIGDFvthcHCVnthsLDVXhgsbLUXBgtb7YXBwlZ7YbCw 1V4YLGy1FwYLW+2FwcJWe0mwMIcuSdhf9NClJjmKBGYTiySB2UiuSmCKmSQJTDGTJIEpZpIkMMVM kgSmmEmSwBQzSRKYYiZJAlPMJElgipkkCUyJciWHLkm4JIcuSbgkCUwJl+TQJQmX5NAlZ8RrJA+W YVPDIzJsDMT+i2XY1E3GZ9gYlqKODpdh6zrKgrwHN8nPJC3U/YWNaEq/xCtoXgrx2yKKHyLydR0z RW1sIV1RL5gG1VwbSNsN6Q4o7eJZWjpfZz5j+8Me9Ab9P8f8C77JC77lr7vN6S8JncRxtgOz+xBF 05TN75cusxjWSMhew2Ho9DHIKoc1iVnXUR7GQgvY27d0/Eyq4P+Z/wtWMQPmk1USe6+bEptyZvZU 0ZBZV0aTMIhcfozQDvNowwzd4s9jxJ6bGTwrlddpt2SLrAsMm00jOr2O/Tf2R8jSwilFHQFD0bBF he2UtzBlTyZ7YL8n59DH80WYzvIjj3b62QD3jyBaPxKmlxSeIKcz6Di9s8RzzjgZ/ERO4OiuH/jv r8nJzPO25KwNHQcWOQa2zaYPbISZG1pSN6WM8IpOAjensc8c+/Vr8jeb3F7fkH8xTwWNKobP7pI3 H28/Ac6ForVweBNzl2QZRHTMjxT7/px5q3O44yOJiYL7zYdfb8/YCN0HPmvCav6UBh7T0cfLa+aN VmMFd85Oh441JtbO/8hZ+avRtMe+OlnDaUz0tRlsjlGBdbmkE2Yn8CD5hsC01l7beglge3cg/Iup nwObD8QWowpLOezl25sP/JVnikx3kIcbZebIv/x2awJM6yYxLb7aa4in1KsBs6/sFwCmdeDp/i22 C7Rn4IGzHWOcUdDI/7KaRl/Y88nc26DX7zujgYLn6voDC35YuyOlMwVFsxfr7dUNtPD9ZX/EZiYW m8Gwjjv8r5fXqibmGP9m7/q/E0WW/b9S594fXubdxNCAip69e9ZoMuPZMePGZHffnTMnBwENEwUD mC/z17+uBhRF6QYi7+0ec/dmYkJ9qrrqQ1PV3XT/OerdQq951cNmI0a9SzEISJrckprsz6SlNlkn h7+DwejqFsKvVlNMxxX9J9Qhh3ZeoZ1KrAP/XFpHb9UOtUl1qNolYe0AqUO/wu8Yv+hLgv717WfU WKePAPF2jJiOSyn0lSoJyXWG/W5omxK2vyMlfYx/Lt3+QffqY9R+helQNuL45XLAriilg4JcRO1X mY4W2dJxUbodo5tOGEe110AdhIS+6gx6CPOpMxhc3sDqa/WH8JOYjk/Dy0jHZTP0lbbVDrxCKteO mI/q5QXqqDfVsB0wgD7A8MsflzfXX/4o2A70UhuGfw4okc9+DgnGfrqmRRTwWLlLmhSQJitpeSVN CkgrOaVDM6MGSGcsvygiGD84i8gqcm5hEjV7JXrWEAS5oAnjnFYuvhUsF+CElkhbX2fRv6KW7QAl sI0Rg4pa+h/XsWB4dQ20vJ1aPjc3pQ+9DiS/GCPxS5W4D0wmrcixKEowaSKpWr3Jl77G0nLGbrhQ gkmLPawH7jNbTvUDG8xqFtZsLDMt3Xhg/uTlCLo3e7vHCzFN+Kp+i0dU1/7jNoE6rJ1yHa3eWjkk 5XrYcNA0taFpgpJJlymK1Ko3RCRJe3V1DmeHXepwcHZrz2kl3v8CQ1pZ04TtlXbRPLWh8GfW05zo xsK+t82v0qtEvtHyfmEb0UfpG03aMKTclHFIa1XL92mk/0mfTXgbLRdn3eGd2IMhbYi8aQgpYgiv 49ynW9nULRfRLRfUrW7qVoroVgrqrm/o1qgpJtuEmK98H2JjE7FeHrG5idgoj6htIjbzIva/hJCr +OmmSQsWn36Kq7xvMPXt+7HuW1+lbzzUEI9e2IbILJzMiEBhDXoKH0d9+tCVxcJNU/v70U33/svv N7i4lSIB/X5ve0/0p+nMHesz9kEGczLD/wu2fj9sKwnbggd7+hBOl5VGJmoSmn5i2JY55ZagfOj6 BnRdHHpkhSOrjA2eu2QfAhcmMz0Qsopl4LaJPTiRTKy+CSBn2izopkimEY5+sxGPk0GndxuuUsZx OpxFtKfLcL4UbCccQaY/80wL50gRdtjt47gmG570V7uwgx7AhMQDAVN90YZ4iKQdMZXrucEwVPSC eCrQx4ZPqwN4cIPFbDlln3m93+UNvWrDXJlImgbjt4Aa605gQZ+RxmLJxph4SeDSngX0RsA0Zmb7 gU/9FSZxbG7mFObu2J7ZwRtMaZwX4SBxDeCWLeRe6JjjQYOGT2vuGzFe2e3ObOONaWpHqRdHIhrL NkoNnvYdO7D1mf0DbaeO+yePWMN+L5pRYQme5QSeja1k6eUJc0ubzeQpcrMReZ0X9svXwHJw8LY7 +DKCN5r5tdkae47Y7agLBrV97LGpe32q244fsLuHl57qHptmQoQllWFNMZcW3qX0dz7+8s0xHjzX sX9szmjsQAto2lUz2tCzAstAS7AaqBFCYPDpByzixzFv1KrrOr47s/Ak1Bm9teD3j51/gUaTOB51 jFAQvgbB2zpP42mLp9p0ej8ES8/aPHFyp8jwDnPbWAB+gdXYKptPJLIGgwsBEFIWZBCfEKJqUr3V lB/P5bomE6Kpj4l3OU7khtR8jISol0yLsrLekun1EA+AngIhCr0IO4PT8MQRm94SPMZ2I9qFZy3M 9LdoyjZgGXg0G2dEnT91LdQbklKr12lZOXUH/eEITmaL7/+mRCHNZqPBU9fD2+yNkt14sHbffaSl KpK6vgEV2lSlXldVWgAK3YN9rLTO9muQpVaT1OWNW5w0ms2mTBpiGtiRMVka6rxCh3Xsnwn0qf8p Cu1a1V/hpKGG+s+xB/xwCr3ITzv+KIQvx+g4510AAqRz8cEmpnIYz1mtU/d+ry0ouxLp4vtJInKY Begzds4JFtTU9f6LHRgPUZpyN2T3ilCmwnR61rMdzjPiiCCReWEMcxN8PM/CHCm8a9jRK95yEfi8 jmjV0RK5VqcdB3azayCeNKf5YY7EbT8Wtiix6t6BnKvRUOMrr95MPXV5AkX6G6mmSOn+RiK0UxC7 Ef7f32jkXHxc9hA3Gi9qvaU+O2NX4/D1lwXlnOucBPMPCTBZlumTL7AWLHuUeCXcocgrr8nLG7BI kZcnUJC8ZCd5G9wi4i9CXvlcfFogi7xisvmfEn8h8ipr8vJMSJGXJ1CQvPIu8jYk7W9CXuVcfFbq EOT9P+h5L2iIpw8BLBfRwATnescK7tlqw4VuoKepMuZjjtirS5NrPNoTB2Wow2Yx4cYWrW3ZAkl/ OWdHTU6WDlsh2Yap5Viebdz7Pu/moV+Ji+lnrSFrNay+BuxtdhHrQnt2aoeTLTwema4vb3EJ5NSm gcC1rvSmDlxaAMNEn9u07URs8gOH7oK3hQULw6b5aAzHG8vo9uNsdDDofrm+6n/EQazEWh9rEn6J AF27MJ+Ho2uwcH3fxuqGZsXRultJahPeZIwoDm/+KhrcxIR64dFk2RMcFAjlTqKTBX0YSTAiMFJg VBcbTw2dGQ2E44jjKqmPB0LF5rK6fbjBk0UvPNucWvCV/kL6BidsLJE/jMjceItLiBe6R+tLGIcw 4YKlNv7XqIktTkFL+qs2fLadR/j6+frXDjWmf/ObT/kJBCGAtD7Af0unqykDsTVKe+AvDgvf3Q1P Srmkd1CbLw/r8cs9HidiM4d7QA/h58uD+vnuQorh6fNKJvQhCbLybvCDTm83fBmX3F3Ih7W5uxue t6o5A7Pzn84hTR4NLva4uQSdh4O73aC84eksQzt76FaGD6MOef/Wd24PG7HOHg7znBsu9B/i5BhO Ag2xQBmFj294lmqtJpwYH6Bj6nO4wOqFA7dwFtRCZxjOHOJIeM70y1mIp18byuJ3qogSvyAppJnZ uaF+6QgbMOqO+uAvx9GbL3ZcG+ZJHJkBmOnQ4AnmOEy6PwE9TulM1/Kd/wrgxfUeTwFH+v9B89h/ I5ple0//qAFebgfwYM0W/ilmhQGV9iwMM1/ZWW/QSR4j3fk45FF1JbWap8H55/RkjUKkpgy/8uZr VnBh6fCxc3NLk8TB4E7Yjhs2XxNtnDW4wHlcBgA0wdNxVhYPwKYNWxnMmzuk5XsbHhYWranYFlvh uzb4FtPYsnBe3A9o8cW/qSPuYE5Zb4Pt4ppBtjqNzdVrWrhMEH8cYyGx1hDPQJXCt9hyfIYflipl 8JuIz3qOWIGhI7ahl4fdNNuQVm6hP6LZhrucmWybtbH1PjqstQ7rXXToKffoDF9vlnLPeNv0uPyk 0HH5WdZ0Y1sHgrcOgmxEphtl2ZhCtuJ4HsRuEvvcbJbwOevaw6K1nafYBFy1ib9yX9qALjwz+KMO AIPLwUpqYrXCJtAfRMYsAIY3l1eXt91Pa4iYeZOm+LBHqrV6vtaKZk+brRWXSrVSVHRH6xTawL95 63g58l+2dfQ/8+8bO/qflbN1eKOb+XsZPe5l9KK9jB73MuMSvcwkX2vx4XFm5W/tOG7tuGhrjbi1 plBreQPi3NnZIatCrIzFQvG7CaulPOQUNEXTGpLger3b7hAstmrO9h+ojbuU1GVV1rQNHQQXabWI 4IIkVIIbiO1Eb9TryroB2umqTaLQbfi0gvVXa2JpW06SDYsawcxgKkWM9izHFS+Cf8edH3q4JQnb 0MEHE/+9b9Tq3K64h1dmLKiqk3i5Fm7swxZoCi4FY7UPFluPuFsDnBju4s2zcfbrpPsBSIsiuY+2 98vcdXSz5r+Ma6b1gbvu5mMf/rwaAdsfZaZ7NO1iu42c2+ztKmc5H1serW6p90xrvJwKzlnEsL+x /TAGuqNPrTkO+q8Kew6Ajks974NXtso+OQgAJ/iXM9eZvfFcZrvg0zCYyxl1tuO6OUZBNkRNSzdx DraguDF5SkjCiWlN9OWMu5pR93Ffi/vVXM09bjYDV2w8ZjDq065koYfrnPcAsY1A5pbv46YvnrWw 2IJchU2S88ZxwhEsfbqY4ltizIaJTkvxZ5qLSLzebuLSqt20ggfsM3H/CgsunSl1IGu/c4V/Bvpn vDwA07PpNTX4PdroQqo1SNmR4YikOG+II0CC72Bs4q0ealpN+tr5hjPrSRX0I77gIStwwt6awN1B Xj7gr4U0rny0oYcyZ4LT0+gd2jsMP/0PfLnr00KoLqkK/AKkEb5qgtWSJbU10lYpttJu8KaI96nD 1yfMuQ6Gv5zD80x3YPHiGcEM5tN5gEzB1yymYzuAmfNIP8Lct+n94Btnz0V8mpxx2I6R2CtZe2LU Sseot46RvDtGwjze0JOMEdkVo+aeGPE6jX3qDhgj5UX3LNw2SoJR4HrYUXRd+qRyZ2G/Fw6EshuU DadGO9jINZlWz3JNIlKp+b30nSo4ib6LBQ1WBm6ygKmIWEBaO1nA1egbvi0Bx1dcP5+1Xl9fI7B2 1IvrHP9Tn+BAGU3p2ftk1FqasrV4Hk/psr05U3N12foTlBoOfFBHSfVTto8IXGz+Wj5lL6piRtrg DrNSDUBdz/5H8yXPMoKzjoFLedibs51BtxvuQNCoS6PLM9IYQK8/+hWYPhjS9kjQuR7RiPKmVFaq SHWq5OpUKdWpUqtTVa9OVaM6Vc3qVGnVqWpVeAtX2V1U2F+QCjsMUmGPQQ7fZfCWAW2nCWIbKOxK E5o704SLdZqg7U4TeBrRW+Q90wSSM00Yb6QJWq40gexPE5R3ShNIdWkCqS5NINWlCaS6NIFUlyaQ 6tIEUl2aQKpLE0h1aQKpLk0gFaYJpKI0wVyXSl99U/8GTY2oMnshv07kMxwFhgfdM1kv7FNDXM+H E5V26hq+fc0bL0zj/+HZFHLoufhiJti4zwJvomMfiBG+mRLPhZ2CZ+lm+rfxmqlorTr0hl/A9eDq rnN0TnHn4HEKbdja+l/MwE4QoBlmuKoON1kXgyARxPhAgUjgFw9ECqQilh6ds4el4zZsnTUhZmBR lsoRhHGgQCTwiwciBVIRS4/O2cNSow1bh5uIGViUpUoEYR4oEAn84oFIgVTE0qNz9rDUbMPWaTpi BhZlqRpBWAcKRAK/eCBSIBWx9OicPSy12rB1fJOYgUVZWo8gJgcKRAK/eCBSIBWx9OicPSydtGHr vDAxA4uytBFBTA8UiAR+8UCkQCpi6dE5e1g6bcPWAXViBhZlaTOCeDhQIBL4xQORAqmIpUfn7GHp Qxu2TkQUM7AoS7UIwj5QIBL4xQORAqmIpUfn7GGp3YatIzjFDCzK0lYE8f1AgUjgFw9ECqQilh6d s4el39uwdearmIGFR/XjmYHHQ41cJxSUGLpOoVQ1sH/0z16uPrZh66RhQQsLkzWeIJgdKhgJBSWC kUKpiqxH/+wlK1M/EyVrwsLCZI3nCeaHCkZCQYlgpFCqIuvRP3vJOkf1c1GyJiwsTNZ4usA5VDAS CkoEI4VSFVmP/tlLVgfVO6JkTVhYmKzxrIF7qGAkFJQIRgqlKrIe/bOXrC6qd0XJmrCwCFlXC5wp xILFQpY1SZGzYqG0Wi2tJRaLLfxiodgJEvk8WmN/CKIefbOfpAtUviD4TcZvvHc205YWZWuc9z4d KCIJ/OIRSYFUw9ajb3az9QmVPyFbn5CtT4JsTVhalK1x4ut9A6I2VE0mzYaUFZKmmi8kCQXFQ5IC qYauR+fs4auHyj2BFGDLvqIsjTNe/1CBSCgoHogUSDUsPTpnD0vx0BXTF2Rpwr6iLI1T3eBQgUgo KB6IFEg1LD06Zw9LA1QeCLI0YV9RlsYrsZaHCkRCQfFApECqYenROXtYukTlS0GWJuwrytJ4Jdbz oQKRUFA8ECmQalh6dM4elj6j8mdBlibsK8rSeCXWy6ECkVBQPBApkGpYenTOHpa+oPIXQZYm7CvK 0ngl1uuhApFQUDwQKZBqWHp0zh6W4iYmJvd437R9RVkar8R6O1QgEgqKByIFUg1Lj87Zw9I3VP4m yNKEfYVH9+MZgh+HikRSQ4lB7BRKRSP8R//s5eoP1P5DkKtJAwuTNZ4g0PWDRSOpokQ40jAV8fXo oizK6uEWKSJ7pKRszM1a3ul523sbFznjbrUJcH17uzqmIt7bmOze25jrBsO3ZaD/6oF+7zwLXK0I X00vIm0YdW47MNdf4a436JwTRQFjboL0OlY1CXD7YvxRkmA8x82NpVfdoB9wK2Ou7RRfzsCX1vjS Jr4mjB/bP0OPm+6LAyejUaAHSx8kypJw8z1Q+KfeJmwtjcU7XnGbdUWOa0ywjny92GQdWbNO2s06 nkbkkZqLdfU8rFP2s0LXVqzQtSQrkC2MFTzbKb6agb9mta5u4mvC+Mo7sk6thHXyO++1X6/JX7ub rJNL77WPPGrkYl0zD+vqGazADihkRctIsKKlxqzj2U7xG/vxW2tWt7RNfE0Yv/6OrGu8F9bweki/ jc7l5E6nX+lvJUVS2sOR/OspfpiEHwbfwl1QG9Ip/aayppNTwtvg37c8222DrUkqnpfZA5Z1pJHy wHTu/twHwzNnbhtW1Oi5u/SteAN+w53PXYdtwK/PZuwyHpLZBk+3TQILy/NdR8djQpKHkCAWu5W4 jbOdxTKgd+1teHb7jB0iMrICkOklb2OXJp2g+3AenQ97vqBXUPT5OXPHOfONdM5Qwu+8fhDtbqDe oKG+EgCQ65TZg4tz3pklG4IyFVQktZlbUEVBWdVyC2poqiKRHIK+b8msiaDUc5nKBLGJmP3n1qgy QbmRRzA8I1afTV2azz/MY6CTFQzvho4p2eBTsiEIVedD8TbljKFUPpTKgcJTpozl2DbEjwniHSvG uze58rzT0G+G3Q2ApbmAAG9z1oPNXTzAiLe78zZGYOTH0CS5Rn6D3z93rtcHZZOaBheWAx89y9I9 +GnK/h3/YuiOaWE/ZDzUaNf4My/foF3meDn18VAWat74DXr6s037sBoMbPZY+cnUn635L9T4Bz0Q gUTWdJaBa1pYG+ONcdPp90D3PP3N57UVhUe0DQ4exkTYAeGhZfRD1ImKQOhUv7d0oFYTudpwHd82 aV9MbWVVsqAcmlZc5kd+kbf8Iq/5RV7yizznF1nmFwnyi/j5RTxREVzgy44LNu3JhN7dTgB3d5Tp gSs01hJByOUhSGmIRfmGLMo3ZFG+IW55CKc8xLw8xKw8xGN5iO/lIezyEA/lIablISblIazyEGZ5 CKM8xLg8hF4WwvDCMxvnJq+Mx6vxQNSfsHMXSV+ii/08Fwd5Ll7mufg5z8UveS5+zXPxW56Lf+S5 mMZb5Gqa0Dn0Ed2GSCRUE1oWNiZsf+iy0MthYMJYCoWflU7teHAhzOxcWv7ogY0FEFbzeAmbBwHe 6XjbYD8ysHhH6GxjvWVg8SqbbazXDCxevbmN9ZKBxSs4t7GeM7B4Fec21jIDi9eBbGMFGVi8caxt LD8Di9cNbmN5GVgiI0sUS5/NXIP1q0SqN9XHC1bk8/vYSJzpiwYqwLcClASdloHPFrBTlIkE7jIA d4I/RRXdaWLohOc+rCXr7PRrWtDbTkDB2rz5z7OzM/DMNlX4gt95l4fuOgW3TU7RxLbAO1JRvDaE uK+sRITZEOK+QRAxdkOIu6A7umU2hLjra6N7dkOIu9wx6jQ2hLirz6Jea0OIuxgo6jY3hLirMqJ+ e0NIMN3YGCSglV/eulJYJLMo5L5Tyq0JhRDWZi/Ezc4oAQXNzqgABREyCkBBhIz6TxAho/wTRMio /gQRMoo/QYSM2k8QIaP0E0TIqPwEETIKP0GEjLpPECGj7BNEyKj6BBEyij5BhIyaTxAho+QTQlhX fNzJwnVxsVByVCJPQhcnCxEqIaSETThG+VGYF8kbeZEcp0UyzG3Pcz3e1NOOZ5Kc/5kkKpL9RBEJ R+KJIq4063kgojT7eSCGkPU8EEPIeh6IIWQ9D8QQsp4HYghZzwMxhKzngRhC1vNADCHreSCGkPU8 EEPIeh6IIWQ9D8QQsp4HYghZzwMxhKzngQjC+nkgksDHzwM5z/NA6OKN54H8s5CS9POAvPvzIP/c VwGRhfh0WVZfXHpuRhAhqy8uPTMjiJDVF5eelxFEyOqLS8/KCCJk9cWl52QEEbL64tIzMoIIWX1x 6fkYQYSsvjjfbAxvfC3ZF+eZJXjKPUnwxMb8+UrSfbH03n2xm79jLSDi5BeZ5xeZ5Rd5zC/yPb+I nV/kf5u7lqa2YSD8VzQ5wQxlHNvKw7dOgUuHdqa0HW6MH3LixJapnFDor+9KJi8w3hV10lzEYHa/ T9YuXlnatab2KhN7ldReRdirJPYqsb1K9I7sKKrK5iGC7e5s7x7aPEQiG+HYRjixERY2wqmN8MRG eGojnNkIz2yE5zbCuY1wYSMsbYRL62hUmmgkTVuYNjft3LQz02amnZp2YtrUtMK0iWlj00amxX3/ 5RZh2bbdaLunKtvAbDdVizYw213VvA3Mdlt13gZG3Fddg806zCfIOswnmHaYTzDpMJ8g7TCfQHSY T5B0mE8Qd5hPEHWYT9CW+2KfT8BH7iafgDjcDfkE/m4+AV/nE/BD5xNwk0+AOWFTPgG649yUTxC9 J58APcq9KZ8AXU9ryidADzpuyidAz51tyidAjwFtyidAT2VsyidAD8lryidAzyxbRY4dLfT0qFXw 2tFCj/FZxc8dLfQ8lVUI39FCD7ZYzSJ2tNATBvQ8CV4P1jUbF1+/XGIvC1dZLqqnaiEK1iuSfi9g F+Y7Cfp9IwqVyoSCp4As119LqGtbFlPBlhJeyvMnLVk/LhCq26sbXXAoTSVLuuHFF51/Xt0E7Fqr ArsqoTMnj2m1hXFqvvRQyvwJvV8lhOZfymUFYPVlVoiiVE8Bc4eDOUtBBCupOuCwzWflUkHQSFi1 CJUevHPGPpVFkS10SaBQDxBQuP4aRykTbD3l8va7x8AOpdRrM2c1gI5IzyQE9Q9pFdR21HVrm3Ew QaRUianOSsJFqIuy0JKs3YF0/4//YaFtfybwDm8Ct/8ycIusEkr/i8GUIGC/Q1W/aSlzPa3u7kNV ibvyXk9egHgp51KXXP9alsCgi3BDsM69iLM0E8k2mz8IOMKWwl0ka6oVWs87H/Tq24ERl4mu/n09 PAT4up701ZDYgTwzMxiHsKjWszr42xmrsj+CjfpjCAsrsTRT1YJFMFuDRzhEQV1QbyoVYfolIVi5 fn0tChfxlI0dp/41rv0pnAjmOdta9RW7LsdTEc/1rRsIPc0Dd8vLCTuBP5/aYf0wY6i4XtaemoVs XTcpw0Jsu7o/DDznYJ7F/cAZ7c+zCPC4ZxFAjs2zCF0mexYBi+JZ8Agcb4DgfeH647fP8HNLwOUB R1zv/c5AgsecgQRyXM5A6jLRGUhYNGfwnMDfo60J8LitCSDHZmtCl8m2JmDRbM1HQX/4AkimFbzl 5CEMBwA+CGU2W8Vjttias7epLqV41PPI+jsT+db0aU3tOsgzBwRcP+CvVkA68kMaPOKHNJCj8kNa l2l++Izl/bMfaqBB4OzT1jg8wdY4yNHZGu8y3dY4FtHWQ3TW0tU0tmbj/h49C4cneBYOcnSehXeZ 7lk4Fs2zPCykcBAYvy3QN0GpVcDDBBAKF6NwMQoXo/AwCg+j8FopRsxxWimMQBuFEUAo2m1hBPTa jdMmwNsFjC3aBRAKF6PwMAoPo/AwCh+j8DEKH6PgGAXHKDhGMcAoBhjFAKMYvkXxF84QnduKBRUA ------=_Part_35744_18065873.1194617364213-- From owner-xfs@oss.sgi.com Sat Nov 10 12:52:03 2007 Received: with ECARTIS (v1.0.0; list xfs); Sat, 10 Nov 2007 12:52:07 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-8.4 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.0-r574664 Received: from mx1.suse.de (mx1.suse.de [195.135.220.2]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAAKq1xu000587 for ; Sat, 10 Nov 2007 12:52:03 -0800 Received: from Relay1.suse.de (mail2.suse.de [195.135.221.8]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.suse.de (Postfix) with ESMTP id 660A51AA4C; Sat, 10 Nov 2007 21:52:07 +0100 (CET) From: Andreas Gruenbacher Organization: SUSE Labs To: Timothy Shimmin Subject: Re: acl and attr: Fix path walking code Date: Sat, 10 Nov 2007 21:52:05 +0100 User-Agent: KMail/1.9.6 (enterprise 20070904.708012) Cc: linux-xfs@oss.sgi.com, Gerald Bringhurst , Brandon Philips References: <200710281858.24428.agruen@suse.de> <4733F301.9020706@sgi.com> In-Reply-To: <4733F301.9020706@sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200711102152.05619.agruen@suse.de> X-Virus-Scanned: ClamAV 0.91.2/4745/Sat Nov 10 02:50:27 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13607 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: agruen@suse.de Precedence: bulk X-list: xfs On Friday 09 November 2007 06:41:21 Timothy Shimmin wrote: > I applied attr patch and tried it out on xfstests/062 > (which I believe was based on one of your tests). > > ========================================================== > --- 062.out 2006-03-28 12:52:32.000000000 +1000 > +++ 062.out.bad 2007-11-09 15:38:09.000000000 +1100 > @@ -526,6 +526,10 @@ > user.name=0xbabe > user.name3=0xdeface > > +# file: SCRATCH_MNT/lnk > +trusted.name=0xbabe > +trusted.name3=0xdeface > + > # file: SCRATCH_MNT/dev/b > trusted.name=0xbabe > trusted.name3=0xdeface > @@ -562,6 +566,10 @@ > user.1=0x3233 > user.x=0x797a > > +# file: SCRATCH_MNT/descend/and/ascend > +trusted.9=0x3837 > +trusted.a=0x6263 > + > > *** directory descent without following symlinks > # file: SCRATCH_MNT/reg > ========================================================== > > So for the following of symlinks with getfattr -L > i.e. > echo "*** directory descent with us following symlinks" > getfattr -h -L -R -m '.' -e hex $SCRATCH_MNT > > Looking at the 2nd difference... > It now picks up descend/and/ascend which contains the symlink > of descend/and --> here/up. > So that makes sense, it is following a symlink which it > didn't before and finding a dir, "up" in the linked dir. > Good. > > Looking at 1st difference... > It is now showing up "lnk" which is a symlink: lnk --> dir > So why is it showing this up > and yet it is not showing descend/and (which is a link to here/up)? > So yes we are following symlinks but are we supposed > to just do the symlinks themselves as well? With -h, the utilities operate on the symlinks rather than the files that the symlinks point to. The test case sets attributes on SCRATCH_MNT/lnk, but not on descend/and. The -h and -L options together don't make much sense actually. > BTW, do we not allow user EAs on symlinks? (I've forgotten) No we don't --- that's explained on attr(5). Thanks for looking at this! Andreas From owner-xfs@oss.sgi.com Sat Nov 10 13:36:52 2007 Received: with ECARTIS (v1.0.0; list xfs); Sat, 10 Nov 2007 13:36:56 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-6.8 required=5.0 tests=AWL,BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.0-r574664 Received: from mx2.suse.de (ns2.suse.de [195.135.220.15]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAALaoLv005890 for ; Sat, 10 Nov 2007 13:36:51 -0800 Received: from Relay2.suse.de (mail2.suse.de [195.135.221.8]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx2.suse.de (Postfix) with ESMTP id B81B327B4B; Sat, 10 Nov 2007 22:36:54 +0100 (CET) From: Andreas Gruenbacher Organization: SUSE Labs To: Timothy Shimmin Subject: Re: acl and attr: Fix path walking code Date: Sat, 10 Nov 2007 22:36:52 +0100 User-Agent: KMail/1.9.6 (enterprise 20070904.708012) Cc: linux-xfs@oss.sgi.com, Gerald Bringhurst , Brandon Philips References: <200710281858.24428.agruen@suse.de> <47340ECC.4000205@sgi.com> In-Reply-To: <47340ECC.4000205@sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200711102236.53298.agruen@suse.de> X-Virus-Scanned: ClamAV 0.91.2/4745/Sat Nov 10 02:50:27 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13608 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: agruen@suse.de Precedence: bulk X-list: xfs On Friday 09 November 2007 08:39:56 Timothy Shimmin wrote: > You mention -L/-P is like chown. > However, -P for getattr isn't about not walking symlinks > to directories, > it's about skipping symlinks altogether, right? Hmm, -L and -P define which files and directories are visited, and -h defines whether we are looking at symlinks or the files they point to. The two concepts are orthogonal. -P is not about skipping symlinks, only about not recursing into them. You can do this (as root), for example: $ ln -s dead link $ setfattr -h -n trusted.name -v value link $ getfattr -h -m- -d -P link # file: link trusted.name="value" With "getfattr -R -P -h" you get a physical dump of all attributes ("a real, complete dump"), while with "getfattr -R -L" you get a logical dump that treats all symlinks as the files they point to. I somewhat doubt that -L with -h has real value. Thanks, Andreas From owner-xfs@oss.sgi.com Sat Nov 10 16:00:12 2007 Received: with ECARTIS (v1.0.0; list xfs); Sat, 10 Nov 2007 16:00:18 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: ** X-Spam-Status: No, score=2.5 required=5.0 tests=AWL,BAYES_50 autolearn=ham version=3.3.0-r574664 Received: from ishtar.tlinx.org (ishtar.tlinx.org [64.81.245.74]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAB00Abr019076 for ; Sat, 10 Nov 2007 16:00:12 -0800 Received: from [192.168.3.11] (Athena [192.168.3.11]) by ishtar.tlinx.org (8.13.3/8.12.10/SuSE Linux 0.7) with ESMTP id lAANciJe013460 for ; Sat, 10 Nov 2007 15:38:44 -0800 Message-ID: <47364104.8020106@tlinx.org> Date: Sat, 10 Nov 2007 15:38:44 -0800 From: Linda Walsh User-Agent: Thunderbird 1.5.0.13 (Windows/20070809) MIME-Version: 1.0 To: Linux-Xfs Subject: minor CPU wake-up question Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4745/Sat Nov 10 02:50:27 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13609 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: xfs@tlinx.org Precedence: bulk X-list: xfs I recently ran into "powertop" (fr. lesswatts.org) that shows how often interrupts awaken a processor under a tickless kernel. The display indicates the counts are over a 10 second period. Barring any disk activity, why would xfsbufd wake up each copy of itself up when there doesn't seem like there would be anything to do. Is a separate process really needed for each partition (that seems to be the case)? I don't know if it is 1 interrupt/bufd or 6 on 1, but it is fairly constant with 6 interrupts each period. FWIW, dirty_writeback_centiseconds is set to 1500(1499) and makes no difference in the count. It doesn't seem to be a big deal, other than it is at the top of the interrupt-chart with usually 60% or more of the ticks. Might be nice to not have it on top if it isn't necessary... 63.2% ( 6.0) xfsbufd : schedule_timeout (process_timeout) 21.1% ( 2.0) : clocksource_register (clocksource_watchdog) 2.1% ( 0.2) : __netdev_watchdog_up (dev_watchdog) 2.1% ( 0.2) : page_writeback_init (wb_timer_fn) 2.1% ( 0.2) : neigh_table_init_no_netlink (neigh_periodic_ 2.1% ( 0.2) init : schedule_timeout (process_timeout) 1.1% ( 0.1) xfssyncd : schedule_timeout (process_timeout) 1.1% ( 0.1) : init_nonfatal_mce_checker (delayed_work_time 1.1% ( 0.1) cron : do_nanosleep (hrtimer_wakeup) 1.1% ( 0.1) nscd : schedule_timeout (process_timeout) 1.1% ( 0.1) irqbalance : do_nanosleep (hrtimer_wakeup) 1.1% ( 0.1) : __neigh_event_send (neigh_timer_handler) 1.1% ( 0.1) : dst_run_gc (dst_run_gc) ---- From owner-xfs@oss.sgi.com Sat Nov 10 20:48:01 2007 Received: with ECARTIS (v1.0.0; list xfs); Sat, 10 Nov 2007 20:48:40 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from sandeen.net (sandeen.net [209.173.210.139]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAB4lu7o016633 for ; Sat, 10 Nov 2007 20:48:01 -0800 Received: from liberator.sandeen.net (liberator.sandeen.net [10.0.0.4]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by sandeen.net (Postfix) with ESMTP id 8F7BD180286BE; Sat, 10 Nov 2007 22:48:00 -0600 (CST) Message-ID: <4736897F.4070202@sandeen.net> Date: Sat, 10 Nov 2007 22:47:59 -0600 From: Eric Sandeen User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Linda Walsh CC: Linux-Xfs Subject: Re: minor CPU wake-up question References: <47364104.8020106@tlinx.org> In-Reply-To: <47364104.8020106@tlinx.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4746/Sat Nov 10 15:11:53 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13610 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: sandeen@sandeen.net Precedence: bulk X-list: xfs Linda Walsh wrote: > I recently ran into "powertop" (fr. lesswatts.org) that shows how often > interrupts awaken a processor under a tickless kernel. > > The display indicates the counts are over a 10 second period. > Barring any disk activity, why would xfsbufd wake up each copy of itself > up when there doesn't seem like there would be anything to do. > > Is a separate process really needed for each partition (that seems to be > the case)? I don't know if it is 1 interrupt/bufd or 6 on 1, but > it is fairly constant with 6 interrupts each period. > FWIW, dirty_writeback_centiseconds is set to 1500(1499) and makes > no difference in the count. > > It doesn't seem to be a big deal, other than it is at the top of the > interrupt-chart with usually 60% or more of the ticks. Might be nice > to not have it on top if it isn't necessary... I think by default, xfsbufd wakes up each second for each filesystem. See fs.xfs.xfsbufd_centisecs, default 100, or 1s. I think powertop is reporting wakeups/second, so it looks you have 6 filesystems mounted? Honestly if that's your most frequent entry, you're probably doing pretty well... -Eric From owner-xfs@oss.sgi.com Sat Nov 10 22:18:42 2007 Received: with ECARTIS (v1.0.0; list xfs); Sat, 10 Nov 2007 22:18:46 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-0.2 required=5.0 tests=BAYES_40 autolearn=ham version=3.3.0-r574664 Received: from mail9.dslextreme.com (mail9.dslextreme.com [66.51.199.94]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAB6Iebp023815 for ; Sat, 10 Nov 2007 22:18:42 -0800 Received: (qmail 28268 invoked from network); 11 Nov 2007 05:52:05 -0000 Received: from unknown (HELO [10.1.0.3]) (66.245.244.201) by mail9.dslextreme.com with (RC4-MD5 encrypted) SMTP; Sat, 10 Nov 2007 21:52:05 -0800 Message-ID: <4736987E.4000208@danbala.com> Date: Sat, 10 Nov 2007 21:51:58 -0800 From: Dirck Blaskey User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) MIME-Version: 1.0 To: xfs@oss.sgi.com Subject: files getting truncated on xfs Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4746/Sat Nov 10 15:11:53 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13611 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: listtarget2@danbala.com Precedence: bulk X-list: xfs Although there was a truncation bug listed as introduced in 2.6.21 and fixed before 2.6.22, I had a number of truncations occur with Ubuntu Gutsy Gibbon (kernel 2.6.22) AMD64 build using XFS on RAID 1 partition, so apparently it's not fixed. This looked very much like the bug described previously - a number of files truncated to block boundaries on restart. I've dropped XFS for Ext3 and AMD64 for I386 in the hopes of finding some stability. best regards, d From owner-xfs@oss.sgi.com Sun Nov 11 08:41:41 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 11 Nov 2007 08:41:48 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-0.4 required=5.0 tests=AWL,BAYES_20,J_CHICKENPOX_45, SPF_HELO_FAIL autolearn=no version=3.3.0-r574664 Received: from mxmail.synplicity.com (synvpn.synplicity.com [209.157.48.1]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lABGfdFW006356 for ; Sun, 11 Nov 2007 08:41:41 -0800 X-IronPort-AV: E=Sophos;i="4.21,402,1188802800"; d="scan'208";a="569970" Received: from mailhost.synplicity.com (HELO synplcty.synplicity.com) ([209.24.66.180]) by mxmail.synplicity.com with ESMTP; 11 Nov 2007 08:41:45 -0800 Received: from [63.110.200.37] (localhost [127.0.0.1]) by synplcty.synplicity.com (8.13.1/8.12.11) with ESMTP id lABGfiTX005818 for ; Sun, 11 Nov 2007 08:41:45 -0800 (PST) Message-ID: <47373142.3010407@synplicity.com> Date: Sun, 11 Nov 2007 08:43:46 -0800 From: Chris Eddington User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) MIME-Version: 1.0 To: xfs mailing list Subject: xfs_repair output interpretation Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4749/Sun Nov 11 06:32:53 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13612 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: chrise@synplicity.com Precedence: bulk X-list: xfs Hi, I could use some help interpreting xfs_repair -n output from a restored 2-port SATA failure on a RAID5 array. I've got all disks up and running, but in a RAID5 degraded state (3 out of 4 disks). The question is, how much of the data is lost (vs. restored correctly)? From the output now it looks like everything is lost (cleared inodes, etc.) , but I was only expecting just recent data operations to be impacted (the machine was pretty much idle when the ports failed). If this really is full data loss below, then I'll need to go back and examine my RAID configuration more carefully. I'm using xfs_repair v2.8.18 on Ubuntu Linux. Thks, Chris ----------------- xfs_repair -n /dev/md0 - creating 4 worker thread(s) Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using internal log - scan filesystem freespace and inode maps... bad on-disk superblock 2 - inconsistent filesystem geometry in realtime filesystem component primary/secondary superblock 2 conflict - AG superblock geometry info conflicts with filesystem geometry would reset bad sb for ag 2 bad uncorrected agheader 2, skipping ag... bad on-disk superblock 24 - bad magic number primary/secondary superblock 24 conflict - AG superblock geometry info conflicts with filesystem geometry bad flags field in superblock 24 bad shared version number in superblock 24 bad inode alignment field in superblock 24 bad stripe unit/width fields in superblock 24 bad log/data device sector size fields in superblock 24 bad magic # 0xc486a1e7 for agi 24 bad version # 127171049 for agi 24 bad sequence # 606867126 for agi 24 bad length # -48052605 for agi 24, should be 11446496 would reset bad sb for ag 24 would reset bad agi for ag 24 bad uncorrected agheader 24, skipping ag... - 10:49:34: scanning filesystem freespace - 30 of 32 allocation groups done - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... error following ag 24 unlinked list - 10:49:34: scanning agi unlinked lists - 32 of 32 allocation groups done - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 imap claims a free inode 268435719 is in use, would correct imap and clear inode bad nblocks 23 for inode 268435723, would reset to 13 corrupt block 0 in directory inode 259 would junk block no . entry for directory 259 no .. entry for directory 259 - agno = 5 - agno = 6 - agno = 7 - agno = 8 attribute entry 0 in attr block 0, inode 2147610149 has bad name (namelen = 0) problem with attribute contents in inode 2147610149 would clear attr fork bad nblocks 11 for inode 2147610149, would reset to 10 bad anextents 1 for inode 2147610149, would reset to 0 attribute entry 0 in attr block 0, inode 2147610376 has bad name (namelen = 0) problem with attribute contents in inode 2147610376 would clear attr fork bad nblocks 13 for inode 2147610376, would reset to 12 bad anextents 1 for inode 2147610376, would reset to 0 - agno = 9 - agno = 10 - agno = 11 imap claims in-use inode 2173744652 is free, would correct imap data fork in ino 2423071372 claims free block 201330859 data fork in ino 2423071372 claims free block 201330860 data fork in ino 2423071372 claims free block 201330861 data fork in ino 2423071372 claims free block 201330862 data fork in ino 2423071372 claims free block 201330863 data fork in ino 2423071375 claims free block 268470033 data fork in ino 2423071375 claims free block 268470034 data fork in ino 2423071375 claims free block 268470035 data fork in ino 2423071375 claims free block 268470036 data fork in ino 2423071375 claims free block 268470037 data fork in ino 2423071375 claims free block 268470038 data fork in ino 2423071376 claims free block 301992536 data fork in ino 2423071376 claims free block 301992537 data fork in ino 2423071376 claims free block 301992538 data fork in ino 2423071376 claims free block 301992539 data fork in ino 2423071376 claims free block 301992540 data fork in ino 2423071376 claims free block 301992541 imap claims a free inode 2423071393 is in use, would correct imap and clear inode imap claims a free inode 2423071394 is in use, would correct imap and clear inode imap claims a free inode 2423071395 is in use, would correct imap and clear inode imap claims in-use inode 2423071409 is free, would correct imap imap claims a free inode 2423071413 is in use, would correct imap and clear inode imap claims a free inode 2691467538 is in use, would correct imap and clear inode imap claims in-use inode 2691467544 is free, would correct imap data fork in ino 2691568164 claims free block 352325825 data fork in ino 2691568164 claims free block 352325826 data fork in ino 2691568164 claims free block 352325827 data fork in ino 2691568164 claims free block 352325828 data fork in ino 2691568164 claims free block 352325829 data fork in ino 2691568164 claims free block 352325830 data fork in ino 2691568164 claims free block 352325831 data fork in ino 2691568166 claims free block 385877409 data fork in ino 2691568166 claims free block 385877410 data fork in ino 2691568166 claims free block 385877411 data fork in ino 2691568166 claims free block 385877412 data fork in ino 2691568166 claims free block 385877413 data fork in ino 2691568166 claims free block 385877414 data fork in ino 2691568166 claims free block 385877415 data fork in ino 2691568170 claims free block 184554409 data fork in ino 2691568170 claims free block 184554410 data fork in ino 2691568170 claims free block 184554411 data fork in ino 2691568170 claims free block 184554412 data fork in ino 2691568170 claims free block 184554413 data fork in ino 2691568170 claims free block 184554414 imap claims in-use inode 2691568170 is free, would correct imap data fork in ino 2691568173 claims free block 251661537 data fork in ino 2691568173 claims free block 251661538 data fork in ino 2691568173 claims free block 251661539 data fork in ino 2691568173 claims free block 251661540 data fork in ino 2691568173 claims free block 251661541 data fork in ino 2691568173 claims free block 251661542 data fork in ino 2691568174 claims free block 285214025 data fork in ino 2691568174 claims free block 285214026 data fork in ino 2691568174 claims free block 285214027 data fork in ino 2691568174 claims free block 285214028 data fork in ino 2691568174 claims free block 285214029 data fork in ino 2691568174 claims free block 285214030 data fork in ino 2691568177 claims free block 318768281 data fork in ino 2691568177 claims free block 318768282 data fork in ino 2691568177 claims free block 318768283 data fork in ino 2691568177 claims free block 318768284 data fork in ino 2691568177 claims free block 318768285 data fork in ino 2691568177 claims free block 318768286 data fork in ino 2691568177 claims free block 318768287 imap claims in-use inode 2691568177 is free, would correct imap imap claims in-use inode 2691568178 is free, would correct imap imap claims in-use inode 2691568180 is free, would correct imap imap claims in-use inode 2691568181 is free, would correct imap imap claims in-use inode 2691568183 is free, would correct imap imap claims in-use inode 2691568184 is free, would correct imap imap claims in-use inode 2691568185 is free, would correct imap - agno = 12 - agno = 13 - agno = 14 - agno = 15 - agno = 16 - agno = 17 - agno = 18 - agno = 19 - agno = 20 - agno = 21 - agno = 22 - agno = 23 - agno = 24 - agno = 25 - agno = 26 - agno = 27 - agno = 28 - agno = 29 - agno = 30 - agno = 31 - 10:53:49: process known inodes and inode discovery - 191040 of 202304 inodes done - process newly discovered inodes... - 10:53:50: process newly discovered inodes - 64 of 32 allocation groups done Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - 10:53:50: setting up duplicate extent list - 32 of 32 allocation groups done - check for inodes claiming duplicate blocks... - agno = 0 corrupt block 0 in directory inode 259 would junk block - agno = 1 bad nblocks 23 for inode 268435723, would reset to 13 entry "verif" at block 0 offset 480 in directory inode 268435726 references non-existent inode 536871184 would clear inode number in entry at offset 480... entry "rev_1" in shortform directory 268435746 references non-existent inode 536871208 would have junked entry "rev_1" in directory inode 268435746 entry "vhdl" in shortform directory 268435752 references non-existent inode 536871227 would have junked entry "vhdl" in directory inode 268435752 entry "blackbox_impl_1" in shortform directory 268435755 references non-existent inode 536871231 would have junked entry "blackbox_impl_1" in directory inode 268435755 entry "vhdl" in shortform directory 268435762 references non-existent inode 536871459 would have junked entry "vhdl" in directory inode 268435762 entry "aqm_sr_2add_fold" in shortform directory 268435765 references non-existent inode 536871469 would have junked entry "aqm_sr_2add_fold" in directory inode 268435765 entry "amplify" in shortform directory 268435768 references non-existent inode 536871472 would have junked entry "amplify" in directory inode 268435768 entry "vhdl" in shortform directory 268435769 references non-existent inode 536871475 would have junked entry "vhdl" in directory inode 268435769 entry "SynDSPparallel49_sync_ret" at block 0 offset 200 in directory inode 268435772 references non-existent inode 536871479 would clear inode number in entry at offset 200... entry "resynthesis" at block 0 offset 944 in directory inode 268435972 references non-existent inode 536871682 would clear inode number in entry at offset 944... entry "verif" at block 0 offset 368 in directory inode 268437003 references non-existent inode 536871690 would clear inode number in entry at offset 368... - agno = 3 entry ".." at block 0 offset 32 in directory inode 805306662 references non-existent inode 536871459 would clear inode number in entry at offset 32... no . entry for directory 259 no .. entry for directory 259 entry "syntmp" at block 0 offset 1408 in directory inode 291 references non-existent inode 536871482 would clear inode number in entry at offset 1408... entry "rev_1" at block 0 offset 208 in directory inode 268437017 references non-existent inode 536871692 would clear inode number in entry at offset 208... entry "test_fsm_arbiter" in shortform directory 268437038 references non-existent inode 536871721 would have junked entry "test_fsm_arbiter" in directory inode 268437038 entry "src" in shortform directory 268437039 references non-existent inode 536871722 would have junked entry "src" in directory inode 268437039 entry "slprj" at block 0 offset 1840 in directory inode 268437040 references non-existent inode 536871738 would clear inode number in entry at offset 1840... entry "src" in shortform directory 268437290 references non-existent inode 536871740 would have junked entry "src" in directory inode 268437290 entry ".." at block 0 offset 32 in directory inode 805307151 references non-existent inode 536871475 would clear inode number in entry at offset 32... entry ".." at block 0 offset 32 in directory inode 805307159 references non-existent inode 536871479 would clear inode number in entry at offset 32... - agno = 2 - agno = 4 entry ".." at block 0 offset 32 in directory inode 1073742080 references non-existent inode 536871168 would clear inode number in entry at offset 32... entry "object_2" in shortform directory 1073742083 references non-existent inode 536901176 would have junked entry "object_2" in directory inode 1073742083 entry ".." at block 0 offset 32 in directory inode 1073742098 references non-existent inode 536871208 would clear inode number in entry at offset 32... entry ".." at block 0 offset 32 in directory inode 805308172 references non-existent inode 536872499 would clear inode number in entry at offset 32... entry ".." at block 0 offset 32 in directory inode 805308201 references non-existent inode 536872508 would clear inode number in entry at offset 32... entry ".." at block 0 offset 32 in directory inode 805309952 references non-existent inode 536874804 would clear inode number in entry at offset 32... entry ".." at block 0 offset 32 in directory inode 805310005 references non-existent inode 536875066 would clear inode number in entry at offset 32... entry ".." at block 0 offset 32 in directory inode 805310210 references non-existent inode 536875070 would clear inode number in entry at offset 32... entry "verif" at block 0 offset 352 in directory inode 805310210 references non-existent inode 536875267 would clear inode number in entry at offset 352... entry ".." at block 0 offset 32 in directory inode 1073744910 references non-existent inode 536873218 would clear inode number in entry at offset 32... entry ".." at block 0 offset 32 in directory inode 805310515 references non-existent inode 536875326 would clear inode number in entry at offset 32... entry ".." at block 0 offset 32 in directory inode 805310729 references non-existent inode 536875541 would clear inode number in entry at offset 32... entry ".." at block 0 offset 32 in directory inode 805310766 references non-existent inode 536875806 would clear inode number in entry at offset 32... entry ".." at block 0 offset 32 in directory inode 805311009 references non-existent inode 536876306 would clear inode number in entry at offset 32... entry ".." at block 0 offset 32 in directory inode 805311249 references non-existent inode 536876575 would clear inode number in entry at offset 32... entry "pop" at block 0 offset 512 in directory inode 805311249 references non-existent inode 536909576 would clear inode number in entry at offset 512... entry "src" in shortform directory 268437291 references non-existent inode 536871948 would have junked entry "src" in directory inode 268437291 entry "src" in shortform directory 268437292 references non-existent inode 536871964 would have junked entry "src" in directory inode 268437292 entry "src" in shortform directory 268437293 references non-existent inode 536871980 would have junked entry "src" in directory inode 268437293 entry "rev_1" at block 0 offset 696 in directory inode 268437301 references non-existent inode 536872195 would clear inode number in entry at offset 696... entry "Retiming" in shortform directory 268437516 references non-existent inode 536872207 would have junked entry "Retiming" in directory inode 268437516 entry "FIXED POINT ARITHMETIC" in shortform directory 268437523 references non-existent inode 536872214 would have junked entry "FIXED POINT ARITHMETIC" in directory inode 268437523 a long list of this stuff .... disconnected dir inode 4130509338, would move to lost+found disconnected inode 4136546817, would move to lost+found disconnected inode 4136546818, would move to lost+found disconnected inode 4136546820, would move to lost+found disconnected inode 4136546821, would move to lost+found disconnected inode 4136546823, would move to lost+found disconnected inode 4136546824, would move to lost+found disconnected inode 4136546826, would move to lost+found disconnected inode 4136546827, would move to lost+found disconnected dir inode 4179320356, would move to lost+found disconnected dir inode 4179320371, would move to lost+found disconnected dir inode 4180337155, would move to lost+found disconnected dir inode 4180337177, would move to lost+found disconnected dir inode 4180337199, would move to lost+found Phase 7 - verify link counts... would have reset inode 268435723 nlinks from 65535 to 1 another long list of these ... would have reset inode 268435726 nlinks from 4 to 3 would have reset inode 268435746 nlinks from 3 to 2 would have reset inode 268435752 nlinks from 4 to 2 would have reset inode 268435755 nlinks from 3 to 2 would have reset inode 268435762 nlinks from 3 to 2 would have reset inode 268435765 nlinks from 4 to 2 would have reset inode 268435768 nlinks from 3 to 2 would have reset inode 268435769 nlinks from 3 to 2 would have reset inode 268435772 nlinks from 6 to 5 would have reset inode 268435972 nlinks from 4 to 3 would have reset inode 268437003 nlinks from 5 to 4 would have reset inode 268437017 nlinks from 3 to 2 would have reset inode 4136546825 nlinks from 5 to 4 would have reset inode 4168420144 nlinks from 7 to 4 - 10:54:24: verify link counts - 191040 of 202304 inodes done No modify flag set, skipping filesystem flush and exiting. From owner-xfs@oss.sgi.com Sun Nov 11 13:48:05 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 11 Nov 2007 13:48:09 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_66 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lABLm0vj013593 for ; Sun, 11 Nov 2007 13:48:03 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id IAA26196; Mon, 12 Nov 2007 08:48:01 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lABLm1dD102616706; Mon, 12 Nov 2007 08:48:01 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lABLm0ab102653059; Mon, 12 Nov 2007 08:48:00 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Mon, 12 Nov 2007 08:48:00 +1100 From: David Chinner To: Lachlan McIlroy Cc: xfs-dev , xfs-oss Subject: Re: [PATCH] bulkstat fixups Message-ID: <20071111214759.GS995458@sgi.com> References: <4733EEF2.9010504@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4733EEF2.9010504@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4750/Sun Nov 11 11:16:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13613 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs [Lachlan, can you wrap your email text at 72 columns for ease of quoting?] On Fri, Nov 09, 2007 at 04:24:02PM +1100, Lachlan McIlroy wrote: > Here's a collection of fixups for bulkstat for all the remaining issues. > > - sanity check for NULL user buffer in xfs_ioc_bulkstat[_compat]() OK. > - remove the special case for XFS_IOC_FSBULKSTAT with count == 1. This > special > case causes bulkstat to fail because the special case uses > xfs_bulkstat_single() > instead of xfs_bulkstat() and the two functions have different semantics. > xfs_bulkstat() will return the next inode after the one supplied while > skipping > internal inodes (ie quota inodes). xfs_bulkstate_single() will only > lookup the > inode supplied and return an error if it is an internal inode. Userspace visile change. What applications do we have that rely on this behaviour that will be broken by this change? > - in xfs_bulkstat(), need to initialise 'lastino' to the inode supplied so > in cases > were we return without examining any inodes the scan wont restart back at > zero. Ok. > - sanity check for valid *ubcountp values. Cannot sanity check for valid > ubuffer > here because some users of xfs_bulkstat() don't supply a buffer. Ok - the cases that supply a buffer are caught by the first checks you added, right? > - checks against 'ubleft' (the space left in the user's buffer) should be > against > 'statstruct_size' which is the supplied minimum object size. The mixture > of > checks against statstruct_size and 0 was one of the reasons we were > skipping > inodes. Can you wrap these checks in a static inline function so that it is obvious what the correct way to check is and we don't reintroduce this porblem? i.e. static inline int xfs_bulkstat_ubuffer_large_enough(ssize_t space) { return (space > sizeof(struct blah)); } That will also remove a stack variable.... > - if the formatter function returns BULKSTAT_RV_NOTHING and an error and > the error > is not ENOENT or EINVAL then we need to abort the scan. ENOENT is for > inodes that > are no longer valid and we just skip them. EINVAL is returned if we try > to lookup > an internal inode so we skip them too. For a DMF scan if the inode and > DMF > attribute cannot fit into the space left in the user's buffer it would > return > ERANGE. We didn't handle this error and skipped the inode. We would > continue to > skip inodes until one fitted into the user's buffer or we completed the > scan. ok. > - put back the recalculation of agino (that got removed with the last fix) > at the > end of the while loop. This is because the code at the start of the loop > expects > agino to be the last inode examined if it is non-zero. ok. > - if we found some inodes but then encountered an error, return success > this time > and the error next time. If the formatter aborted with ENOMEM we will > now return > this error but only if we couldn't read any inodes. Previously if we > encountered > ENOMEM without reading any inodes we returned a zero count and no error > which > falsely indicated the scan was complete. ok. FWIW - missing from this set of patches - cpu_relax() in the loops. In the case where no I/O is required to do the scan, we can hold the cpu for a long time and that will hold off I/O completion, etc for the cpu bulkstat is running on. Hence after every cluster we scan we should cpu_relax() to allow other processes cpu time on that cpu. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Sun Nov 11 15:57:36 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 11 Nov 2007 15:57:40 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from sandeen.net (sandeen.net [209.173.210.139]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lABNvXbP028420 for ; Sun, 11 Nov 2007 15:57:35 -0800 Received: from liberator.sandeen.net (liberator.sandeen.net [10.0.0.4]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by sandeen.net (Postfix) with ESMTP id 1644F1802868A; Sun, 11 Nov 2007 17:57:37 -0600 (CST) Message-ID: <473796EF.6050104@sandeen.net> Date: Sun, 11 Nov 2007 17:57:35 -0600 From: Eric Sandeen User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: xfs-masters@oss.sgi.com CC: xfs@oss.sgi.com, kernel-janitors@vger.kernel.org Subject: Re: [xfs-masters] [PATCH] fs/xfs: remove duplicated defines References: <20071111134351.106efb98@lucky.kitzblitz> In-Reply-To: <20071111134351.106efb98@lucky.kitzblitz> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4750/Sun Nov 11 11:16:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13614 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: sandeen@sandeen.net Precedence: bulk X-list: xfs Nicolas Kaiser wrote: > Remove duplicated defines. > > Signed-off-by: Nicolas Kaiser > --- Heh, each defined twice, but used 0 times in the kernel. Could probably just remove them altogether (though I guess btoc is used in xfstests/ltp/doio.c in userspace/xfstests, but that's the *only* place) -Eric > fs/xfs/linux-2.6/xfs_linux.h | 4 ---- > 1 file changed, 4 deletions(-) > > --- a/fs/xfs/linux-2.6/xfs_linux.h 2007-11-07 11:26:20.000000000 +0100 > +++ b/fs/xfs/linux-2.6/xfs_linux.h 2007-11-11 13:07:11.000000000 +0100 > @@ -167,12 +167,8 @@ > > /* clicks to bytes */ > #define ctob(x) ((__psunsigned_t)(x)< -#define btoct(x) ((__psunsigned_t)(x)>>BPCSHIFT) > #define ctob64(x) ((__uint64_t)(x)< > -/* bytes to clicks */ > -#define btoc(x) (((__psunsigned_t)(x)+(NBPC-1))>>BPCSHIFT) > - > #define ENOATTR ENODATA /* Attribute not found */ > #define EWRONGFS EINVAL /* Mount with wrong filesystem type */ > #define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */ > > From owner-xfs@oss.sgi.com Sun Nov 11 16:08:33 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 11 Nov 2007 16:08:37 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.4 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAC08Ro0029596 for ; Sun, 11 Nov 2007 16:08:31 -0800 Received: from timothy-shimmins-power-mac-g5.local (boing.melbourne.sgi.com [134.14.55.141]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA28797; Mon, 12 Nov 2007 11:08:25 +1100 Message-ID: <473799A9.9000200@sgi.com> Date: Mon, 12 Nov 2007 11:09:13 +1100 From: Timothy Shimmin User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Eric Sandeen CC: xfs-masters@oss.sgi.com, xfs@oss.sgi.com, kernel-janitors@vger.kernel.org Subject: Re: [xfs-masters] [PATCH] fs/xfs: remove duplicated defines References: <20071111134351.106efb98@lucky.kitzblitz> <473796EF.6050104@sandeen.net> In-Reply-To: <473796EF.6050104@sandeen.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4750/Sun Nov 11 11:16:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13615 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: tes@sgi.com Precedence: bulk X-list: xfs Eric Sandeen wrote: > Nicolas Kaiser wrote: >> Remove duplicated defines. >> >> Signed-off-by: Nicolas Kaiser >> --- > > Heh, each defined twice, but used 0 times in the kernel. Could probably > just remove them altogether (though I guess btoc is used in > xfstests/ltp/doio.c in userspace/xfstests, but that's the *only* place) > Yes, that's what I was just noticing - how they are not used anyway. > grep -Ir btoc . | egrep -v 'tag|anot' ./linux-2.6/xfs_linux.h:#define btoc(x) (((__psunsigned_t)(x)+(NBPC-1))>>BPCSHIFT) ./linux-2.6/xfs_linux.h:#define btoct(x) ((__psunsigned_t)(x)>>BPCSHIFT) ./linux-2.6/xfs_linux.h:#define btoc64(x) (((__uint64_t)(x)+(NBPC-1))>>BPCSHIFT) ./linux-2.6/xfs_linux.h:#define btoct64(x) ((__uint64_t)(x)>>BPCSHIFT) ./linux-2.6/xfs_linux.h:#define btoct(x) ((__psunsigned_t)(x)>>BPCSHIFT) ./linux-2.6/xfs_linux.h:#define btoc(x) (((__psunsigned_t)(x)+(NBPC-1))>>BPCSHIFT) ./linux-2.6/xfs_buf.c: page_count = xfs_buf_btoc(end) - xfs_buf_btoct(bp->b_file_offset); ./linux-2.6/xfs_buf.c: page = bp->b_pages[xfs_buf_btoct(boff + bp->b_offset)]; ./linux-2.6/xfs_buf.h:#define xfs_buf_btoc(dd) (((dd) + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT) ./linux-2.6/xfs_buf.h:#define xfs_buf_btoct(dd) ((dd) >> PAGE_CACHE_SHIFT) --Tim > -Eric > >> fs/xfs/linux-2.6/xfs_linux.h | 4 ---- >> 1 file changed, 4 deletions(-) >> >> --- a/fs/xfs/linux-2.6/xfs_linux.h 2007-11-07 11:26:20.000000000 +0100 >> +++ b/fs/xfs/linux-2.6/xfs_linux.h 2007-11-11 13:07:11.000000000 +0100 >> @@ -167,12 +167,8 @@ >> >> /* clicks to bytes */ >> #define ctob(x) ((__psunsigned_t)(x)<> -#define btoct(x) ((__psunsigned_t)(x)>>BPCSHIFT) >> #define ctob64(x) ((__uint64_t)(x)<> >> -/* bytes to clicks */ >> -#define btoc(x) (((__psunsigned_t)(x)+(NBPC-1))>>BPCSHIFT) >> - >> #define ENOATTR ENODATA /* Attribute not found */ >> #define EWRONGFS EINVAL /* Mount with wrong filesystem type */ >> #define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */ >> >> > From owner-xfs@oss.sgi.com Sun Nov 11 17:58:19 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 11 Nov 2007 17:58:23 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAC1wFBK012219 for ; Sun, 11 Nov 2007 17:58:18 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA00726; Mon, 12 Nov 2007 12:58:15 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAC1wDdD102745149; Mon, 12 Nov 2007 12:58:14 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAC1wBcf102654081; Mon, 12 Nov 2007 12:58:11 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Mon, 12 Nov 2007 12:58:10 +1100 From: David Chinner To: Dirck Blaskey Cc: xfs@oss.sgi.com Subject: Re: files getting truncated on xfs Message-ID: <20071112015810.GX995458@sgi.com> References: <4736987E.4000208@danbala.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4736987E.4000208@danbala.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4750/Sun Nov 11 11:16:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13616 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Sat, Nov 10, 2007 at 09:51:58PM -0800, Dirck Blaskey wrote: > Although there was a truncation bug listed as introduced in 2.6.21 and > fixed before 2.6.22, can you be more specific? i.e. a link to the bug report? > I had a number of truncations occur with Ubuntu Gutsy Gibbon (kernel > 2.6.22) AMD64 build > using XFS on RAID 1 partition, so apparently it's not fixed. This > looked very much like > the bug described previously - a number of files truncated to block > boundaries on restart. What is the test case you have that shows this bug? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Sun Nov 11 18:28:08 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 11 Nov 2007 18:28:14 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAC2S5mD016033 for ; Sun, 11 Nov 2007 18:28:07 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA01318 for ; Mon, 12 Nov 2007 13:28:10 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAC2S7dD102721159 for ; Mon, 12 Nov 2007 13:28:09 +1100 (AEDT) Received: (from xaiki@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAC2S6FX102646065 for xfs@oss.sgi.com; Mon, 12 Nov 2007 13:28:06 +1100 (AEDT) Date: Mon, 12 Nov 2007 13:28:06 +1100 From: Niv Sardi To: xfs@oss.sgi.com Subject: Re: Default mount options (that suck less). Message-ID: <20071112022806.GB102580300@melbourne.sgi.com> References: <20071029075657.GA84369978@melbourne.sgi.com> <20071031233516.GB88034736@melbourne.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071031233516.GB88034736@melbourne.sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4750/Sun Nov 11 11:16:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13617 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: xaiki@sgi.com Precedence: bulk X-list: xfs No gnus is bad news ? Shall I interpret the lack of comments as "everything is fine, just commit the damn thing ?" Cheers, -- Niv From owner-xfs@oss.sgi.com Sun Nov 11 18:56:46 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 11 Nov 2007 18:56:49 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_66 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAC2udRs019827 for ; Sun, 11 Nov 2007 18:56:43 -0800 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA01873; Mon, 12 Nov 2007 13:56:35 +1100 Message-ID: <4737C11D.8030007@sgi.com> Date: Mon, 12 Nov 2007 13:57:33 +1100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.6 (X11/20070728) MIME-Version: 1.0 To: David Chinner CC: xfs-dev , xfs-oss Subject: Re: [PATCH] bulkstat fixups References: <4733EEF2.9010504@sgi.com> <20071111214759.GS995458@sgi.com> In-Reply-To: <20071111214759.GS995458@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4750/Sun Nov 11 11:16:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13618 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs David Chinner wrote: > [Lachlan, can you wrap your email text at 72 columns for ease of quoting?] > > On Fri, Nov 09, 2007 at 04:24:02PM +1100, Lachlan McIlroy wrote: >> Here's a collection of fixups for bulkstat for all the remaining issues. >> >> - sanity check for NULL user buffer in xfs_ioc_bulkstat[_compat]() > > OK. > >> - remove the special case for XFS_IOC_FSBULKSTAT with count == 1. This >> special >> case causes bulkstat to fail because the special case uses >> xfs_bulkstat_single() >> instead of xfs_bulkstat() and the two functions have different semantics. >> xfs_bulkstat() will return the next inode after the one supplied while >> skipping >> internal inodes (ie quota inodes). xfs_bulkstate_single() will only >> lookup the >> inode supplied and return an error if it is an internal inode. > > Userspace visile change. What applications do we have that rely on this > behaviour that will be broken by this change? Any apps that rely on the existing behaviour are probably broken. If an app wants to call xfs_bulkstat_single() it should use XFS_IOC_FSBULKSTAT_SINGLE. Alex Elder reported he has a test program that exploited this bug to compare the results of xfs_bulkstat_one (does an iget) with the results of an xfs_bulkstat (copying the inode from the cluster). That should be easy to fix. Other than that it doesn't make sense for an app to use bulkstat to get one inode at a time but if they do it will now work correctly. > >> - in xfs_bulkstat(), need to initialise 'lastino' to the inode supplied so >> in cases >> were we return without examining any inodes the scan wont restart back at >> zero. > > Ok. > >> - sanity check for valid *ubcountp values. Cannot sanity check for valid >> ubuffer >> here because some users of xfs_bulkstat() don't supply a buffer. > > Ok - the cases that supply a buffer are caught by the first checks you > added, right? > >> - checks against 'ubleft' (the space left in the user's buffer) should be >> against >> 'statstruct_size' which is the supplied minimum object size. The mixture >> of >> checks against statstruct_size and 0 was one of the reasons we were >> skipping >> inodes. > > Can you wrap these checks in a static inline function so that it is obvious > what the correct way to check is and we don't reintroduce this porblem? i.e. > > static inline int > xfs_bulkstat_ubuffer_large_enough(ssize_t space) > { > return (space > sizeof(struct blah)); > } > > That will also remove a stack variable.... That won't work - statstruct_size is passed into xfs_bulkstat() so we don't know what 'blah' is. Maybe a macro would be easier. #define XFS_BULKSTAT_UBLEFT (ubleft >= statstruct_size) > >> - if the formatter function returns BULKSTAT_RV_NOTHING and an error and >> the error >> is not ENOENT or EINVAL then we need to abort the scan. ENOENT is for >> inodes that >> are no longer valid and we just skip them. EINVAL is returned if we try >> to lookup >> an internal inode so we skip them too. For a DMF scan if the inode and >> DMF >> attribute cannot fit into the space left in the user's buffer it would >> return >> ERANGE. We didn't handle this error and skipped the inode. We would >> continue to >> skip inodes until one fitted into the user's buffer or we completed the >> scan. > > ok. > >> - put back the recalculation of agino (that got removed with the last fix) >> at the >> end of the while loop. This is because the code at the start of the loop >> expects >> agino to be the last inode examined if it is non-zero. > > ok. > >> - if we found some inodes but then encountered an error, return success >> this time >> and the error next time. If the formatter aborted with ENOMEM we will >> now return >> this error but only if we couldn't read any inodes. Previously if we >> encountered >> ENOMEM without reading any inodes we returned a zero count and no error >> which >> falsely indicated the scan was complete. > > ok. > > FWIW - missing from this set of patches - cpu_relax() in the loops. In the case > where no I/O is required to do the scan, we can hold the cpu for a long time > and that will hold off I/O completion, etc for the cpu bulkstat is running on. > Hence after every cluster we scan we should cpu_relax() to allow other > processes cpu time on that cpu. > I don't get how cpu_relax() works. I see that it is called at times with a spinlock held so it wont trigger a context switch. Does it give interrupts a chance to run? It appears to be used where a minor delay is needed - I don't think we have any cases in xfs_bulkstat() where we need to wait for an event that isn't I/O. From owner-xfs@oss.sgi.com Sun Nov 11 19:10:57 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 11 Nov 2007 19:11:00 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAC3ApTg021497 for ; Sun, 11 Nov 2007 19:10:55 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA02158; Mon, 12 Nov 2007 14:10:51 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAC3AodD101939657; Mon, 12 Nov 2007 14:10:51 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAC3AnJH102687302; Mon, 12 Nov 2007 14:10:49 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Mon, 12 Nov 2007 14:10:49 +1100 From: David Chinner To: Niv Sardi Cc: xfs@oss.sgi.com Subject: Re: Default mount options (that suck less). Message-ID: <20071112031049.GS66820511@sgi.com> References: <20071029075657.GA84369978@melbourne.sgi.com> <20071031233516.GB88034736@melbourne.sgi.com> <20071112022806.GB102580300@melbourne.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071112022806.GB102580300@melbourne.sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4750/Sun Nov 11 11:16:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13619 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Mon, Nov 12, 2007 at 01:28:06PM +1100, Niv Sardi wrote: > No gnus is bad news ? > > Shall I interpret the lack of comments as "everything is fine, just commit the damn thing ?" Lack of comments to what? I haven't seen whatever patches you've sent which means wither you didn't tag the email as "[PATCH N/M]..." or "[REVIEW]..." and I didn't notice the attachments (nope, I see no patches in my archives of this thread) or, more likely, a spam filter along the way ate them. If you get no answer you resend the patches with [PATCH, RESEND] in the title.... Can you please resend all the patches you want reviewed as a patch per email? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Sun Nov 11 20:05:17 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 11 Nov 2007 20:05:28 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from itchy (dhcp17.melbourne.sgi.com [134.14.55.17]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAC45CKU027726 for ; Sun, 11 Nov 2007 20:05:16 -0800 Received: by itchy (Postfix, from userid 16403) id 410ADBB01; Mon, 12 Nov 2007 14:48:49 +1100 (EST) From: xaiki@sgi.com To: xfs@oss.sgi.com Cc: Niv Sardi Subject: [[PATCH, RESEND]] Default to version 2 attributes. Date: Mon, 12 Nov 2007 14:48:45 +1100 Message-Id: <1194839329-22003-2-git-send-email-xaiki@sgi.com> X-Mailer: git-send-email 1.5.3.5 In-Reply-To: <1194839329-22003-1-git-send-email-xaiki@sgi.com> References: <20071031233516.GB88034736@melbourne.sgi.com> <1194839329-22003-1-git-send-email-xaiki@sgi.com> X-Virus-Scanned: ClamAV 0.91.2/4750/Sun Nov 11 11:16:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13622 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: xaiki@sgi.com Precedence: bulk X-list: xfs From: Niv Sardi --- xfsprogs/mkfs/xfs_mkfs.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/xfsprogs/mkfs/xfs_mkfs.c b/xfsprogs/mkfs/xfs_mkfs.c index 5f3299d..b378800 100644 --- a/xfsprogs/mkfs/xfs_mkfs.c +++ b/xfsprogs/mkfs/xfs_mkfs.c @@ -677,7 +677,7 @@ main( bindtextdomain(PACKAGE, LOCALEDIR); textdomain(PACKAGE); - attrversion = 0; + attrversion = 2; blflag = bsflag = slflag = ssflag = lslflag = lssflag = 0; blocklog = blocksize = 0; sectorlog = lsectorlog = XFS_MIN_SECTORSIZE_LOG; -- 1.5.3.5 From owner-xfs@oss.sgi.com Sun Nov 11 20:05:17 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 11 Nov 2007 20:05:28 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_46, J_CHICKENPOX_47 autolearn=no version=3.3.0-r574664 Received: from itchy (dhcp17.melbourne.sgi.com [134.14.55.17]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAC45CTT027725 for ; Sun, 11 Nov 2007 20:05:16 -0800 Received: by itchy (Postfix, from userid 16403) id 68F55BB04; Mon, 12 Nov 2007 14:48:49 +1100 (EST) From: xaiki@sgi.com To: xfs@oss.sgi.com Cc: Niv Sardi Subject: [[PATCH, RESEND]] reduce imaxpct for big filesystems, Date: Mon, 12 Nov 2007 14:48:48 +1100 Message-Id: <1194839329-22003-5-git-send-email-xaiki@sgi.com> X-Mailer: git-send-email 1.5.3.5 In-Reply-To: <1194839329-22003-4-git-send-email-xaiki@sgi.com> References: <20071031233516.GB88034736@melbourne.sgi.com> <1194839329-22003-1-git-send-email-xaiki@sgi.com> <1194839329-22003-2-git-send-email-xaiki@sgi.com> <1194839329-22003-3-git-send-email-xaiki@sgi.com> <1194839329-22003-4-git-send-email-xaiki@sgi.com> X-Virus-Scanned: ClamAV 0.91.2/4750/Sun Nov 11 11:16:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13621 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: xaiki@sgi.com Precedence: bulk X-list: xfs From: Niv Sardi imaxpct is set to 25% for FS < 1 TB, then 5% for FS < 50 TB, and then 1%. It is implemented as a step function in calc_default_imaxpct() --- xfsprogs/mkfs/xfs_mkfs.c | 19 +++++++++++++++++-- 1 files changed, 17 insertions(+), 2 deletions(-) diff --git a/xfsprogs/mkfs/xfs_mkfs.c b/xfsprogs/mkfs/xfs_mkfs.c index 3689eb7..78c2c77 100644 --- a/xfsprogs/mkfs/xfs_mkfs.c +++ b/xfsprogs/mkfs/xfs_mkfs.c @@ -374,6 +374,21 @@ validate_log_size(__uint64_t logblocks, int blocklog, int min_logblocks) } } +int +calc_default_imaxpct( + int blocklog, + __uint64_t dblocks) +{ + if (dblocks < TERABYTES(1, blocklog)) { + return XFS_DFL_IMAXIMUM_PCT; + } else if (dblocks < TERABYTES(50, blocklog)) { + return 5; + } + + return 1; +} + + void calc_default_ag_geometry( int blocklog, @@ -1986,7 +2001,7 @@ an AG size that is one stripe unit smaller, for example %llu.\n"), dfile, isize, (long long)agcount, (long long)agsize, "", sectorsize, attrversion, "", blocksize, (long long)dblocks, - imflag ? imaxpct : XFS_DFL_IMAXIMUM_PCT, + calc_default_imaxpct(blocklog, dblocks), "", dsunit, dswidth, dirversion, dirversion == 1 ? blocksize : dirblocksize, logfile, 1 << blocklog, (long long)logblocks, @@ -2023,7 +2038,7 @@ an AG size that is one stripe unit smaller, for example %llu.\n"), (__uint8_t)(rtextents ? libxfs_highbit32((unsigned int)rtextents) : 0); sbp->sb_inprogress = 1; /* mkfs is in progress */ - sbp->sb_imax_pct = imflag ? imaxpct : XFS_DFL_IMAXIMUM_PCT; + sbp->sb_imax_pct = calc_default_imaxpct(blocklog, dblocks); sbp->sb_icount = 0; sbp->sb_ifree = 0; sbp->sb_fdblocks = dblocks - agcount * XFS_PREALLOC_BLOCKS(mp) - -- 1.5.3.5 From owner-xfs@oss.sgi.com Sun Nov 11 20:05:17 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 11 Nov 2007 20:05:27 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from itchy (dhcp17.melbourne.sgi.com [134.14.55.17]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAC45Cb7027727 for ; Sun, 11 Nov 2007 20:05:16 -0800 Received: by itchy (Postfix, from userid 16403) id 2A58DBAFE; Mon, 12 Nov 2007 14:48:49 +1100 (EST) From: xaiki@sgi.com To: xfs@oss.sgi.com Cc: Niv Sardi Subject: [[PATCH, RESEND]] Default to log version 2 Date: Mon, 12 Nov 2007 14:48:44 +1100 Message-Id: <1194839329-22003-1-git-send-email-xaiki@sgi.com> X-Mailer: git-send-email 1.5.3.5 In-Reply-To: <20071031233516.GB88034736@melbourne.sgi.com> References: <20071031233516.GB88034736@melbourne.sgi.com> X-Virus-Scanned: ClamAV 0.91.2/4750/Sun Nov 11 11:16:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13620 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: xaiki@sgi.com Precedence: bulk X-list: xfs From: Niv Sardi --- xfsprogs/mkfs/xfs_mkfs.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/xfsprogs/mkfs/xfs_mkfs.c b/xfsprogs/mkfs/xfs_mkfs.c index 6e84a4e..5f3299d 100644 --- a/xfsprogs/mkfs/xfs_mkfs.c +++ b/xfsprogs/mkfs/xfs_mkfs.c @@ -686,7 +686,7 @@ main( ilflag = imflag = ipflag = isflag = 0; liflag = laflag = lsflag = ldflag = lvflag = 0; loginternal = 1; - logversion = 1; + logversion = 2; logagno = logblocks = rtblocks = rtextblocks = 0; Nflag = nlflag = nsflag = nvflag = 0; dirblocklog = dirblocksize = dirversion = 0; -- 1.5.3.5 From owner-xfs@oss.sgi.com Sun Nov 11 20:05:21 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 11 Nov 2007 20:05:31 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_47, J_CHICKENPOX_51,J_CHICKENPOX_71,J_CHICKENPOX_75 autolearn=no version=3.3.0-r574664 Received: from itchy (dhcp17.melbourne.sgi.com [134.14.55.17]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAC45Iig027766 for ; Sun, 11 Nov 2007 20:05:20 -0800 Received: by itchy (Postfix, from userid 16403) id 57C23BB02; Mon, 12 Nov 2007 14:48:49 +1100 (EST) From: xaiki@sgi.com To: xfs@oss.sgi.com Cc: Niv Sardi Subject: [[PATCH, RESEND]] Drop the ability to turn unwritten extents off completly Date: Mon, 12 Nov 2007 14:48:46 +1100 Message-Id: <1194839329-22003-3-git-send-email-xaiki@sgi.com> X-Mailer: git-send-email 1.5.3.5 In-Reply-To: <1194839329-22003-2-git-send-email-xaiki@sgi.com> References: <20071031233516.GB88034736@melbourne.sgi.com> <1194839329-22003-1-git-send-email-xaiki@sgi.com> <1194839329-22003-2-git-send-email-xaiki@sgi.com> X-Virus-Scanned: ClamAV 0.91.2/4750/Sun Nov 11 11:16:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13625 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: xaiki@sgi.com Precedence: bulk X-list: xfs From: Niv Sardi --- xfsprogs/mkfs/xfs_mkfs.c | 38 +++++++++++++------------------------- xfsprogs/mkfs/xfs_mkfs.h | 6 +++--- 2 files changed, 16 insertions(+), 28 deletions(-) diff --git a/xfsprogs/mkfs/xfs_mkfs.c b/xfsprogs/mkfs/xfs_mkfs.c index b378800..3689eb7 100644 --- a/xfsprogs/mkfs/xfs_mkfs.c +++ b/xfsprogs/mkfs/xfs_mkfs.c @@ -56,25 +56,23 @@ char *dopts[] = { "sunit", #define D_SWIDTH 5 "swidth", -#define D_UNWRITTEN 6 - "unwritten", -#define D_AGSIZE 7 +#define D_AGSIZE 6 "agsize", -#define D_SU 8 +#define D_SU 7 "su", -#define D_SW 9 +#define D_SW 8 "sw", -#define D_SECTLOG 10 +#define D_SECTLOG 9 "sectlog", -#define D_SECTSIZE 11 +#define D_SECTSIZE 10 "sectsize", -#define D_NOALIGN 12 +#define D_NOALIGN 11 "noalign", -#define D_RTINHERIT 13 +#define D_RTINHERIT 12 "rtinherit", -#define D_PROJINHERIT 14 +#define D_PROJINHERIT 13 "projinherit", -#define D_EXTSZINHERIT 15 +#define D_EXTSZINHERIT 14 "extszinherit", NULL }; @@ -604,7 +602,6 @@ main( int dsw; int dsunit; int dswidth; - int extent_flagging; int force_overwrite; struct fsxattr fsx; int iaflag; @@ -697,7 +694,6 @@ main( dsize = logsize = rtsize = rtextsize = protofile = NULL; dsu = dsw = dsunit = dswidth = lalign = lsu = lsunit = 0; nodsflag = norsflag = 0; - extent_flagging = 1; force_overwrite = 0; worst_freelist = 0; lazy_sb_counters = 0; @@ -877,14 +873,6 @@ main( D_NOALIGN); nodsflag = 1; break; - case D_UNWRITTEN: - if (!value) - reqval('d', dopts, D_UNWRITTEN); - c = atoi(value); - if (c < 0 || c > 1) - illegal(value, "d unwritten"); - extent_flagging = c; - break; case D_SECTLOG: if (!value) reqval('d', dopts, D_SECTLOG); @@ -1990,7 +1978,7 @@ an AG size that is one stripe unit smaller, for example %llu.\n"), "meta-data=%-22s isize=%-6d agcount=%lld, agsize=%lld blks\n" " =%-22s sectsz=%-5u attr=%u\n" "data =%-22s bsize=%-6u blocks=%llu, imaxpct=%u\n" - " =%-22s sunit=%-6u swidth=%u blks, unwritten=%u\n" + " =%-22s sunit=%-6u swidth=%u blks\n" "naming =version %-14u bsize=%-6u\n" "log =%-22s bsize=%-6d blocks=%lld, version=%d\n" " =%-22s sectsz=%-5u sunit=%d blks, lazy-count=%d\n" @@ -1999,7 +1987,7 @@ an AG size that is one stripe unit smaller, for example %llu.\n"), "", sectorsize, attrversion, "", blocksize, (long long)dblocks, imflag ? imaxpct : XFS_DFL_IMAXIMUM_PCT, - "", dsunit, dswidth, extent_flagging, + "", dsunit, dswidth, dirversion, dirversion == 1 ? blocksize : dirblocksize, logfile, 1 << blocklog, (long long)logblocks, logversion, "", lsectorsize, lsunit, lazy_sb_counters, @@ -2066,7 +2054,7 @@ an AG size that is one stripe unit smaller, for example %llu.\n"), } sbp->sb_features2 = XFS_SB_VERSION2_MKFS(lazy_sb_counters, attrversion == 2, 0); sbp->sb_versionnum = XFS_SB_VERSION_MKFS( - iaflag, dsunit != 0, extent_flagging, + iaflag, dsunit != 0, dirversion == 2, logversion == 2, attrversion == 1, (sectorsize != BBSIZE || lsectorsize != BBSIZE), sbp->sb_features2 != 0); @@ -2537,7 +2525,7 @@ usage( void ) /* blocksize */ [-b log=n|size=num]\n\ /* data subvol */ [-d agcount=n,agsize=n,file,name=xxx,size=num,\n\ (sunit=value,swidth=value|su=num,sw=num),\n\ - sectlog=n|sectsize=num,unwritten=0|1]\n\ + sectlog=n|sectsize=num\n\ /* inode size */ [-i log=n|perblock=n|size=num,maxpct=n,attr=0|1|2]\n\ /* log subvol */ [-l agnum=n,internal,size=num,logdev=xxx,version=n\n\ sunit=value|su=num,sectlog=n|sectsize=num,\n\ diff --git a/xfsprogs/mkfs/xfs_mkfs.h b/xfsprogs/mkfs/xfs_mkfs.h index 1ab85fd..f19f917 100644 --- a/xfsprogs/mkfs/xfs_mkfs.h +++ b/xfsprogs/mkfs/xfs_mkfs.h @@ -18,12 +18,12 @@ #ifndef __XFS_MKFS_H__ #define __XFS_MKFS_H__ -#define XFS_SB_VERSION_MKFS(ia,dia,extflag,dir2,log2,attr1,sflag,more) (\ - ((ia)||(dia)||(extflag)||(dir2)||(log2)||(attr1)||(sflag)||(more)) ? \ +#define XFS_SB_VERSION_MKFS(ia,dia,dir2,log2,attr1,sflag,more) (\ + ((ia)||(dia)||(dir2)||(log2)||(attr1)||(sflag)||(more)) ? \ ( XFS_SB_VERSION_4 | \ ((ia) ? XFS_SB_VERSION_ALIGNBIT : 0) | \ ((dia) ? XFS_SB_VERSION_DALIGNBIT : 0) | \ - ((extflag) ? XFS_SB_VERSION_EXTFLGBIT : 0) | \ + (XFS_SB_VERSION_EXTFLGBIT) | \ ((dir2) ? XFS_SB_VERSION_DIRV2BIT : 0) | \ ((log2) ? XFS_SB_VERSION_LOGV2BIT : 0) | \ ((attr1) ? XFS_SB_VERSION_ATTRBIT : 0) | \ -- 1.5.3.5 From owner-xfs@oss.sgi.com Sun Nov 11 20:05:18 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 11 Nov 2007 20:05:30 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from itchy (dhcp17.melbourne.sgi.com [134.14.55.17]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAC45CMH027728 for ; Sun, 11 Nov 2007 20:05:16 -0800 Received: by itchy (Postfix, from userid 16403) id 5D3D9BB03; Mon, 12 Nov 2007 14:48:49 +1100 (EST) From: xaiki@sgi.com To: xfs@oss.sgi.com Cc: Niv Sardi Subject: [[PATCH, RESEND]] V2 inodes per default, and move DFL bits to XFS_DFL_SB_VERSION_BITS, Date: Mon, 12 Nov 2007 14:48:47 +1100 Message-Id: <1194839329-22003-4-git-send-email-xaiki@sgi.com> X-Mailer: git-send-email 1.5.3.5 In-Reply-To: <1194839329-22003-3-git-send-email-xaiki@sgi.com> References: <20071031233516.GB88034736@melbourne.sgi.com> <1194839329-22003-1-git-send-email-xaiki@sgi.com> <1194839329-22003-2-git-send-email-xaiki@sgi.com> <1194839329-22003-3-git-send-email-xaiki@sgi.com> X-Virus-Scanned: ClamAV 0.91.2/4750/Sun Nov 11 11:16:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13624 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: xaiki@sgi.com Precedence: bulk X-list: xfs From: Niv Sardi Activate XFS_SB_VERSION_NLINKBIT per default, witch will enable V2 INODES. refactor bits that we want everytime in XFS_DFL_SB_VERSION_BITS. --- xfsprogs/mkfs/xfs_mkfs.h | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/xfsprogs/mkfs/xfs_mkfs.h b/xfsprogs/mkfs/xfs_mkfs.h index f19f917..9291e33 100644 --- a/xfsprogs/mkfs/xfs_mkfs.h +++ b/xfsprogs/mkfs/xfs_mkfs.h @@ -18,17 +18,19 @@ #ifndef __XFS_MKFS_H__ #define __XFS_MKFS_H__ +#define XFS_DFL_SB_VERSION_BITS XFS_SB_VERSION_NLINKBIT|XFS_SB_VERSION_EXTFLGBIT + #define XFS_SB_VERSION_MKFS(ia,dia,dir2,log2,attr1,sflag,more) (\ ((ia)||(dia)||(dir2)||(log2)||(attr1)||(sflag)||(more)) ? \ ( XFS_SB_VERSION_4 | \ ((ia) ? XFS_SB_VERSION_ALIGNBIT : 0) | \ ((dia) ? XFS_SB_VERSION_DALIGNBIT : 0) | \ - (XFS_SB_VERSION_EXTFLGBIT) | \ ((dir2) ? XFS_SB_VERSION_DIRV2BIT : 0) | \ ((log2) ? XFS_SB_VERSION_LOGV2BIT : 0) | \ ((attr1) ? XFS_SB_VERSION_ATTRBIT : 0) | \ ((sflag) ? XFS_SB_VERSION_SECTORBIT : 0) | \ ((more) ? XFS_SB_VERSION_MOREBITSBIT : 0) | \ + XFS_DFL_SB_VERSION_BITS | \ 0 ) : XFS_SB_VERSION_1 ) #define XFS_SB_VERSION2_MKFS(lazycount, attr2, parent) (\ -- 1.5.3.5 From owner-xfs@oss.sgi.com Sun Nov 11 20:05:18 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 11 Nov 2007 20:05:29 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.2 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from itchy (dhcp17.melbourne.sgi.com [134.14.55.17]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAC45CAG027724 for ; Sun, 11 Nov 2007 20:05:16 -0800 Received: by itchy (Postfix, from userid 16403) id 6ED36BB05; Mon, 12 Nov 2007 14:48:49 +1100 (EST) From: xaiki@sgi.com To: xfs@oss.sgi.com Cc: Niv Sardi Subject: [[PATCH, RESEND]] less AGs for single disks configs. Date: Mon, 12 Nov 2007 14:48:49 +1100 Message-Id: <1194839329-22003-6-git-send-email-xaiki@sgi.com> X-Mailer: git-send-email 1.5.3.5 In-Reply-To: <1194839329-22003-5-git-send-email-xaiki@sgi.com> References: <20071031233516.GB88034736@melbourne.sgi.com> <1194839329-22003-1-git-send-email-xaiki@sgi.com> <1194839329-22003-2-git-send-email-xaiki@sgi.com> <1194839329-22003-3-git-send-email-xaiki@sgi.com> <1194839329-22003-4-git-send-email-xaiki@sgi.com> <1194839329-22003-5-git-send-email-xaiki@sgi.com> X-Virus-Scanned: ClamAV 0.91.2/4750/Sun Nov 11 11:16:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13623 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: xaiki@sgi.com Precedence: bulk X-list: xfs From: Niv Sardi get the underlying structure with get_subvol_stripe_wrapper(), and pass sunit | swidth as an argument to calc_default_ag_geometry(). if it is set, get the AG sizes bigger. this also cleans up a typo: - } else if (daflag) /* User-specified AG size */ + } else if (daflag) /* User-specified AG count */ --- xfsprogs/mkfs/xfs_mkfs.c | 18 ++++++++++++------ 1 files changed, 12 insertions(+), 6 deletions(-) diff --git a/xfsprogs/mkfs/xfs_mkfs.c b/xfsprogs/mkfs/xfs_mkfs.c index 78c2c77..4cf9975 100644 --- a/xfsprogs/mkfs/xfs_mkfs.c +++ b/xfsprogs/mkfs/xfs_mkfs.c @@ -393,6 +393,7 @@ void calc_default_ag_geometry( int blocklog, __uint64_t dblocks, + int multidisk, __uint64_t *agsize, __uint64_t *agcount) { @@ -428,12 +429,13 @@ calc_default_ag_geometry( * * This scales us up smoothly between min/max AG sizes. */ + if (dblocks > GIGABYTES(512, blocklog)) - shift = 5; + shift = 5 - (multidisk == 0); else if (dblocks > GIGABYTES(8, blocklog)) - shift = 4; + shift = 4 - (multidisk == 0); else if (dblocks >= MEGABYTES(128, blocklog)) - shift = 3; + shift = 3 - (multidisk == 0); else ASSERT(0); blocks = dblocks >> shift; @@ -1771,10 +1773,14 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"), agsize /= blocksize; agcount = dblocks / agsize + (dblocks % agsize != 0); - } else if (daflag) /* User-specified AG size */ + } else if (daflag) /* User-specified AG count */ agsize = dblocks / agcount + (dblocks % agcount != 0); - else - calc_default_ag_geometry(blocklog, dblocks, &agsize, &agcount); + else { + get_subvol_stripe_wrapper(dfile, SVTYPE_DATA, + &xlv_dsunit, &xlv_dswidth, §oralign), + calc_default_ag_geometry(blocklog, dblocks, xlv_dsunit | xlv_dswidth, + &agsize, &agcount); + } /* * If the last AG is too small, reduce the filesystem size -- 1.5.3.5 From owner-xfs@oss.sgi.com Sun Nov 11 20:11:26 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 11 Nov 2007 20:11:29 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_66 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAC4BKv8030644 for ; Sun, 11 Nov 2007 20:11:23 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA03617; Mon, 12 Nov 2007 15:11:25 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAC4BMdD101333304; Mon, 12 Nov 2007 15:11:22 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAC4BLuU100083838; Mon, 12 Nov 2007 15:11:21 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Mon, 12 Nov 2007 15:11:21 +1100 From: David Chinner To: Lachlan McIlroy Cc: David Chinner , xfs-dev , xfs-oss Subject: Re: [PATCH] bulkstat fixups Message-ID: <20071112041121.GT66820511@sgi.com> References: <4733EEF2.9010504@sgi.com> <20071111214759.GS995458@sgi.com> <4737C11D.8030007@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4737C11D.8030007@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4750/Sun Nov 11 11:16:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13626 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Mon, Nov 12, 2007 at 01:57:33PM +1100, Lachlan McIlroy wrote: > David Chinner wrote: > >[Lachlan, can you wrap your email text at 72 columns for ease of quoting?] > > > >On Fri, Nov 09, 2007 at 04:24:02PM +1100, Lachlan McIlroy wrote: > >>Here's a collection of fixups for bulkstat for all the remaining issues. > >> > >>- sanity check for NULL user buffer in xfs_ioc_bulkstat[_compat]() > > > >OK. > > > >>- remove the special case for XFS_IOC_FSBULKSTAT with count == 1. This > >>special > >> case causes bulkstat to fail because the special case uses > >> xfs_bulkstat_single() > >> instead of xfs_bulkstat() and the two functions have different > >> semantics. > >> xfs_bulkstat() will return the next inode after the one supplied while > >> skipping > >> internal inodes (ie quota inodes). xfs_bulkstate_single() will only > >> lookup the > >> inode supplied and return an error if it is an internal inode. > > > >Userspace visile change. What applications do we have that rely on this > >behaviour that will be broken by this change? > > Any apps that rely on the existing behaviour are probably broken. If an app > wants to call xfs_bulkstat_single() it should use XFS_IOC_FSBULKSTAT_SINGLE. Perhaps, but we can't arbitrarily decide that those apps will now break on a new kernel with this change. At minimum we need to audit all of the code we have that uses bulkstat for such breakage (including DMF!) before we make a change like this. > >>- checks against 'ubleft' (the space left in the user's buffer) should be > >>against > >> 'statstruct_size' which is the supplied minimum object size. The > >> mixture of > >> checks against statstruct_size and 0 was one of the reasons we were > >> skipping > >> inodes. > > > >Can you wrap these checks in a static inline function so that it is obvious > >what the correct way to check is and we don't reintroduce this porblem? > >i.e. > > > >static inline int > >xfs_bulkstat_ubuffer_large_enough(ssize_t space) > >{ > > return (space > sizeof(struct blah)); > >} > > > >That will also remove a stack variable.... > > That won't work - statstruct_size is passed into xfs_bulkstat() so we don't > know what 'blah' is. Maybe a macro would be easier. > > #define XFS_BULKSTAT_UBLEFT (ubleft >= statstruct_size) Yeah, something like that, but I don't like macros with no parameters used like that.... > >FWIW - missing from this set of patches - cpu_relax() in the loops. In the > >case > >where no I/O is required to do the scan, we can hold the cpu for a long > >time > >and that will hold off I/O completion, etc for the cpu bulkstat is running > >on. > >Hence after every cluster we scan we should cpu_relax() to allow other > >processes cpu time on that cpu. > > > > I don't get how cpu_relax() works. I see that it is called at times with a > spinlock held so it wont trigger a context switch. Does it give interrupts > a chance to run? Sorry, my mistake - confused cpu_relax() with cond_resched(). take the above paragraph and s/cpu_relax/cond_resched/g > It appears to be used where a minor delay is needed - I don't think we have > any > cases in xfs_bulkstat() where we need to wait for an event that isn't I/O. The issue is when we're hitting cached buffers and we never end up waiting for I/O - we will then monopolise the cpu we are running on and hold off all other processing. It's antisocial and leads to high latencies for other code. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Sun Nov 11 22:24:33 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 11 Nov 2007 22:24:43 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAC6OP1d014663 for ; Sun, 11 Nov 2007 22:24:29 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA06259; Mon, 12 Nov 2007 17:24:26 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAC6OPdD101200267; Mon, 12 Nov 2007 17:24:26 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAC6ONIs102754194; Mon, 12 Nov 2007 17:24:23 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Mon, 12 Nov 2007 17:24:23 +1100 From: David Chinner To: xaiki@sgi.com Cc: xfs@oss.sgi.com Subject: Re: [[PATCH, RESEND]] Default to version 2 attributes. Message-ID: <20071112062423.GZ66820511@sgi.com> References: <20071031233516.GB88034736@melbourne.sgi.com> <1194839329-22003-1-git-send-email-xaiki@sgi.com> <1194839329-22003-2-git-send-email-xaiki@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1194839329-22003-2-git-send-email-xaiki@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4750/Sun Nov 11 11:16:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13628 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Mon, Nov 12, 2007 at 02:48:45PM +1100, xaiki@sgi.com wrote: > From: Niv Sardi > Description? > --- > xfsprogs/mkfs/xfs_mkfs.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/xfsprogs/mkfs/xfs_mkfs.c b/xfsprogs/mkfs/xfs_mkfs.c > index 5f3299d..b378800 100644 > --- a/xfsprogs/mkfs/xfs_mkfs.c > +++ b/xfsprogs/mkfs/xfs_mkfs.c > @@ -677,7 +677,7 @@ main( > bindtextdomain(PACKAGE, LOCALEDIR); > textdomain(PACKAGE); > > - attrversion = 0; > + attrversion = 2; > blflag = bsflag = slflag = ssflag = lslflag = lssflag = 0; > blocklog = blocksize = 0; > sectorlog = lsectorlog = XFS_MIN_SECTORSIZE_LOG; ok. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Sun Nov 11 22:23:58 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 11 Nov 2007 22:24:21 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAC6Nq7J014592 for ; Sun, 11 Nov 2007 22:23:57 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA06247; Mon, 12 Nov 2007 17:23:52 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAC6NodD102718351; Mon, 12 Nov 2007 17:23:51 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAC6NnGo102920682; Mon, 12 Nov 2007 17:23:49 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Mon, 12 Nov 2007 17:23:48 +1100 From: David Chinner To: xaiki@sgi.com Cc: xfs@oss.sgi.com Subject: Re: [[PATCH, RESEND]] Default to log version 2 Message-ID: <20071112062348.GY66820511@sgi.com> References: <20071031233516.GB88034736@melbourne.sgi.com> <1194839329-22003-1-git-send-email-xaiki@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1194839329-22003-1-git-send-email-xaiki@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4750/Sun Nov 11 11:16:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13627 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Mon, Nov 12, 2007 at 02:48:44PM +1100, xaiki@sgi.com wrote: > From: Niv Sardi > Description? > --- > xfsprogs/mkfs/xfs_mkfs.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/xfsprogs/mkfs/xfs_mkfs.c b/xfsprogs/mkfs/xfs_mkfs.c > index 6e84a4e..5f3299d 100644 > --- a/xfsprogs/mkfs/xfs_mkfs.c > +++ b/xfsprogs/mkfs/xfs_mkfs.c > @@ -686,7 +686,7 @@ main( > ilflag = imflag = ipflag = isflag = 0; > liflag = laflag = lsflag = ldflag = lvflag = 0; > loginternal = 1; > - logversion = 1; > + logversion = 2; > logagno = logblocks = rtblocks = rtextblocks = 0; > Nflag = nlflag = nsflag = nvflag = 0; > dirblocklog = dirblocksize = dirversion = 0; ok. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Sun Nov 11 22:28:04 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 11 Nov 2007 22:28:10 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAC6Ru8s015578 for ; Sun, 11 Nov 2007 22:28:02 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA06342; Mon, 12 Nov 2007 17:27:57 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAC6RudD102862746; Mon, 12 Nov 2007 17:27:57 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAC6RsAY102941840; Mon, 12 Nov 2007 17:27:54 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Mon, 12 Nov 2007 17:27:54 +1100 From: David Chinner To: xaiki@sgi.com Cc: xfs@oss.sgi.com Subject: Re: [[PATCH, RESEND]] Drop the ability to turn unwritten extents off completly Message-ID: <20071112062754.GA66820511@sgi.com> References: <20071031233516.GB88034736@melbourne.sgi.com> <1194839329-22003-1-git-send-email-xaiki@sgi.com> <1194839329-22003-2-git-send-email-xaiki@sgi.com> <1194839329-22003-3-git-send-email-xaiki@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1194839329-22003-3-git-send-email-xaiki@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4750/Sun Nov 11 11:16:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13629 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Mon, Nov 12, 2007 at 02:48:46PM +1100, xaiki@sgi.com wrote: > From: Niv Sardi > This needs a good description for why it's being removed and the man pages need updating. (e.g. mkfs.xfs) The code looks good, but NACK until the man page updates is done as well. Cheers, dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Sun Nov 11 22:31:42 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 11 Nov 2007 22:31:49 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAC6Vbo4016474 for ; Sun, 11 Nov 2007 22:31:40 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA06507; Mon, 12 Nov 2007 17:31:38 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAC6VbdD102837116; Mon, 12 Nov 2007 17:31:38 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAC6VaHM102914599; Mon, 12 Nov 2007 17:31:36 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Mon, 12 Nov 2007 17:31:36 +1100 From: David Chinner To: xaiki@sgi.com Cc: xfs@oss.sgi.com Subject: Re: [[PATCH, RESEND]] V2 inodes per default, and move DFL bits to XFS_DFL_SB_VERSION_BITS, Message-ID: <20071112063136.GB66820511@sgi.com> References: <20071031233516.GB88034736@melbourne.sgi.com> <1194839329-22003-1-git-send-email-xaiki@sgi.com> <1194839329-22003-2-git-send-email-xaiki@sgi.com> <1194839329-22003-3-git-send-email-xaiki@sgi.com> <1194839329-22003-4-git-send-email-xaiki@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1194839329-22003-4-git-send-email-xaiki@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4750/Sun Nov 11 11:16:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13630 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Mon, Nov 12, 2007 at 02:48:47PM +1100, xaiki@sgi.com wrote: > From: Niv Sardi > > Activate XFS_SB_VERSION_NLINKBIT per default, witch will enable V2 INODES. > refactor bits that we want everytime in XFS_DFL_SB_VERSION_BITS. Yay! A description ;) s/witch/which/ > --- > xfsprogs/mkfs/xfs_mkfs.h | 4 +++- > 1 files changed, 3 insertions(+), 1 deletions(-) > > diff --git a/xfsprogs/mkfs/xfs_mkfs.h b/xfsprogs/mkfs/xfs_mkfs.h > index f19f917..9291e33 100644 > --- a/xfsprogs/mkfs/xfs_mkfs.h > +++ b/xfsprogs/mkfs/xfs_mkfs.h > @@ -18,17 +18,19 @@ > #ifndef __XFS_MKFS_H__ > #define __XFS_MKFS_H__ > > +#define XFS_DFL_SB_VERSION_BITS XFS_SB_VERSION_NLINKBIT|XFS_SB_VERSION_EXTFLGBIT > + I suggest splitting this over multiple lines and calling it XFS_DEFAULT_SB_VERSION_BITS. #define XFS_DEFAULT_SB_VERSION_BITS \ (XFS_SB_VERSION_NLINKBIT | \ XFS_SB_VERSION_EXTFLGBIT) Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Sun Nov 11 22:33:49 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 11 Nov 2007 22:33:59 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAC6Xi25017022 for ; Sun, 11 Nov 2007 22:33:47 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA06576; Mon, 12 Nov 2007 17:33:46 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAC6XidD102962624; Mon, 12 Nov 2007 17:33:46 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAC6XhGZ102799165; Mon, 12 Nov 2007 17:33:43 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Mon, 12 Nov 2007 17:33:43 +1100 From: David Chinner To: xaiki@sgi.com Cc: xfs@oss.sgi.com Subject: Re: [[PATCH, RESEND]] reduce imaxpct for big filesystems, Message-ID: <20071112063343.GC66820511@sgi.com> References: <20071031233516.GB88034736@melbourne.sgi.com> <1194839329-22003-1-git-send-email-xaiki@sgi.com> <1194839329-22003-2-git-send-email-xaiki@sgi.com> <1194839329-22003-3-git-send-email-xaiki@sgi.com> <1194839329-22003-4-git-send-email-xaiki@sgi.com> <1194839329-22003-5-git-send-email-xaiki@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1194839329-22003-5-git-send-email-xaiki@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4750/Sun Nov 11 11:16:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13631 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Mon, Nov 12, 2007 at 02:48:48PM +1100, xaiki@sgi.com wrote: > From: Niv Sardi > > imaxpct is set to 25% for FS < 1 TB, > then 5% for FS < 50 TB, > and then 1%. > > It is implemented as a step function in calc_default_imaxpct() > --- > xfsprogs/mkfs/xfs_mkfs.c | 19 +++++++++++++++++-- > 1 files changed, 17 insertions(+), 2 deletions(-) > > diff --git a/xfsprogs/mkfs/xfs_mkfs.c b/xfsprogs/mkfs/xfs_mkfs.c > index 3689eb7..78c2c77 100644 > --- a/xfsprogs/mkfs/xfs_mkfs.c > +++ b/xfsprogs/mkfs/xfs_mkfs.c > @@ -374,6 +374,21 @@ validate_log_size(__uint64_t logblocks, int blocklog, int min_logblocks) > } > } > > +int > +calc_default_imaxpct( static int > + int blocklog, > + __uint64_t dblocks) > +{ > + if (dblocks < TERABYTES(1, blocklog)) { > + return XFS_DFL_IMAXIMUM_PCT; > + } else if (dblocks < TERABYTES(50, blocklog)) { > + return 5; > + } > + > + return 1; > +} Comment explaining what it is doing? (i.e. what I had to explain to you in the first place ;). Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Mon Nov 12 05:23:58 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 12 Nov 2007 05:25:40 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lACDNps9028219 for ; Mon, 12 Nov 2007 05:23:53 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id UAA09159; Mon, 12 Nov 2007 20:01:50 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAC91ndD102997794; Mon, 12 Nov 2007 20:01:49 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAC91lpQ102945493; Mon, 12 Nov 2007 20:01:47 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Mon, 12 Nov 2007 20:01:47 +1100 From: David Chinner To: xaiki@sgi.com Cc: xfs@oss.sgi.com Subject: Re: [[PATCH, RESEND]] less AGs for single disks configs. Message-ID: <20071112090147.GD66820511@sgi.com> References: <20071031233516.GB88034736@melbourne.sgi.com> <1194839329-22003-1-git-send-email-xaiki@sgi.com> <1194839329-22003-2-git-send-email-xaiki@sgi.com> <1194839329-22003-3-git-send-email-xaiki@sgi.com> <1194839329-22003-4-git-send-email-xaiki@sgi.com> <1194839329-22003-5-git-send-email-xaiki@sgi.com> <1194839329-22003-6-git-send-email-xaiki@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1194839329-22003-6-git-send-email-xaiki@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4752/Mon Nov 12 00:00:52 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13632 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Mon, Nov 12, 2007 at 02:48:49PM +1100, xaiki@sgi.com wrote: > From: Niv Sardi > > get the underlying structure with get_subvol_stripe_wrapper(), > and pass sunit | swidth as an argument to calc_default_ag_geometry(). > > if it is set, get the AG sizes bigger. > > this also cleans up a typo: > - } else if (daflag) /* User-specified AG size */ > + } else if (daflag) /* User-specified AG count */ No need to mention that you are cleaning up a typo in the description ;) > --- > xfsprogs/mkfs/xfs_mkfs.c | 18 ++++++++++++------ > 1 files changed, 12 insertions(+), 6 deletions(-) > > diff --git a/xfsprogs/mkfs/xfs_mkfs.c b/xfsprogs/mkfs/xfs_mkfs.c > index 78c2c77..4cf9975 100644 > --- a/xfsprogs/mkfs/xfs_mkfs.c > +++ b/xfsprogs/mkfs/xfs_mkfs.c > @@ -393,6 +393,7 @@ void > calc_default_ag_geometry( > int blocklog, > __uint64_t dblocks, > + int multidisk, > __uint64_t *agsize, > __uint64_t *agcount) > { > @@ -428,12 +429,13 @@ calc_default_ag_geometry( > * > * This scales us up smoothly between min/max AG sizes. > */ > + > if (dblocks > GIGABYTES(512, blocklog)) > - shift = 5; > + shift = 5 - (multidisk == 0); > else if (dblocks > GIGABYTES(8, blocklog)) > - shift = 4; > + shift = 4 - (multidisk == 0); > else if (dblocks >= MEGABYTES(128, blocklog)) > - shift = 3; > + shift = 3 - (multidisk == 0); > else > ASSERT(0); Ok, so now we end up with half the number of allocation groups at these different sizes. That's not exactly what I had in mind. basically, what you've done works out as: > 512GB old = 32 AGs, new = 16AGs > 8 GB old = 16 AGs, new = 8AGs > 128MB old = 8 AGs, new = 4AGs on an 8Gb filesystem we still get 8 AGs, which is far too many. on a 750GB disk, we still get 16AGs, which to far too many. A single spindle, regardless of it's size, will have similar seek characteristics so scaling the number of AGs with size is the wrong thing to do - you don't get better parallelism out of a single spindle, just more seeks and lower performance. hence keeping the number of AGs fixed up to the point where the AG size tops out (i.e. 4TB) seems like a better scaling factor to me. i.e. something like: if (!multidisk) { if (dblocks >= TERABYTES(4, blocklog)) { blocks = XFS_AG_MAX_BLOCKS(blocklog); goto done; } agcount = 4; /* work out ag size here */ goto done; } I'd also like to see some test results showing the mkfs output for the different configurations to confirm it works correctly (i.e. that the corner cases work correctly). Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Mon Nov 12 06:18:25 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 12 Nov 2007 06:19:26 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-0.7 required=5.0 tests=AWL,BAYES_40 autolearn=ham version=3.3.0-r574664 Received: from r2d2.neofacto.lu (mail.neofacto.lu [158.64.60.195]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lACEICKR008576 for ; Mon, 12 Nov 2007 06:18:16 -0800 Received: from localhost (localhost [127.0.0.1]) by r2d2.neofacto.lu (Postfix) with ESMTP id E39C6C3045 for ; Mon, 12 Nov 2007 13:20:24 +0100 (CET) X-Virus-Scanned: ClamAV 0.91.2/4752/Mon Nov 12 00:00:52 2007 on oss.sgi.com X-Virus-Scanned: Ubuntu amavisd-new at neofacto.lu Received: from r2d2.neofacto.lu ([127.0.0.1]) by localhost (r2d2.neofacto.lu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ec4w8O+pedUu for ; Mon, 12 Nov 2007 13:20:11 +0100 (CET) Received: by r2d2.neofacto.lu (Postfix, from userid 65534) id 8E628C306C; Mon, 12 Nov 2007 13:20:11 +0100 (CET) Received: from [192.168.1.166] (SU105.tudor.lu [158.64.4.205]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by r2d2.neofacto.lu (Postfix) with ESMTP id 84217C3045 for ; Mon, 12 Nov 2007 13:20:02 +0100 (CET) Message-ID: <473844ED.7080605@jamendo.com> Date: Mon, 12 Nov 2007 13:19:57 +0100 From: Amandine AUPETIT User-Agent: Thunderbird 2.0.0.6 (X11/20071022) MIME-Version: 1.0 To: xfs@oss.sgi.com Subject: xfs options Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Status: Clean X-archive-position: 13633 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: amandine@jamendo.com Precedence: bulk X-list: xfs Hi :) I have a 7tb XFS filesystem. The partition is supposed to stock big files between 2mo and 50mo. Quite no small files. So i'm trying to adapt the mount option to what I need, and what I need is security. I want to be sure that NONE of the files will be lost, even if there's a power failure. For now, I just tried the noatime mount option, I know there is a lot of options, but I don't really understand what they stand for. Do you have any suggestion to make my filesystem safer ? Thanks Amandine From owner-xfs@oss.sgi.com Mon Nov 12 06:54:23 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 12 Nov 2007 06:54:47 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from lucidpixels.com (lucidpixels.com [75.144.35.66]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lACEsJrR013771 for ; Mon, 12 Nov 2007 06:54:22 -0800 Received: by lucidpixels.com (Postfix, from userid 1001) id 2795F1C00027F; Mon, 12 Nov 2007 09:54:25 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by lucidpixels.com (Postfix) with ESMTP id 23DEC4082F7D; Mon, 12 Nov 2007 09:54:25 -0500 (EST) Date: Mon, 12 Nov 2007 09:54:25 -0500 (EST) From: Justin Piszcz X-X-Sender: jpiszcz@p34.internal.lan To: Amandine AUPETIT cc: xfs@oss.sgi.com Subject: Re: xfs options In-Reply-To: <473844ED.7080605@jamendo.com> Message-ID: References: <473844ED.7080605@jamendo.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: ClamAV 0.91.2/4754/Mon Nov 12 06:03:00 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13634 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jpiszcz@lucidpixels.com Precedence: bulk X-list: xfs On Mon, 12 Nov 2007, Amandine AUPETIT wrote: > Hi :) > > I have a 7tb XFS filesystem. The partition is supposed to stock big files > between 2mo and 50mo. Quite no small files. > > So i'm trying to adapt the mount option to what I need, and what I need is > security. I want to be sure that NONE of the files will be lost, even if > there's a power failure. > For now, I just tried the noatime mount option, I know there is a lot of > options, but I don't really understand what they stand for. > > Do you have any suggestion to make my filesystem safer ? > > Thanks > > Amandine > > Put the system and array on a UPS. There are also some other tweaks, but nothing is as important as putting it on a UPS. Justin. From owner-xfs@oss.sgi.com Mon Nov 12 06:57:18 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 12 Nov 2007 06:57:41 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from lucidpixels.com (lucidpixels.com [75.144.35.66]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lACEvFlK014329 for ; Mon, 12 Nov 2007 06:57:17 -0800 Received: by lucidpixels.com (Postfix, from userid 1001) id EB5EF1C00027F; Mon, 12 Nov 2007 09:57:21 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by lucidpixels.com (Postfix) with ESMTP id E65964082F7D; Mon, 12 Nov 2007 09:57:21 -0500 (EST) Date: Mon, 12 Nov 2007 09:57:21 -0500 (EST) From: Justin Piszcz X-X-Sender: jpiszcz@p34.internal.lan To: David Chinner cc: xaiki@sgi.com, xfs@oss.sgi.com Subject: Re: [[PATCH, RESEND]] less AGs for single disks configs. In-Reply-To: <20071112090147.GD66820511@sgi.com> Message-ID: References: <20071031233516.GB88034736@melbourne.sgi.com> <1194839329-22003-1-git-send-email-xaiki@sgi.com> <1194839329-22003-2-git-send-email-xaiki@sgi.com> <1194839329-22003-3-git-send-email-xaiki@sgi.com> <1194839329-22003-4-git-send-email-xaiki@sgi.com> <1194839329-22003-5-git-send-email-xaiki@sgi.com> <1194839329-22003-6-git-send-email-xaiki@sgi.com> <20071112090147.GD66820511@sgi.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: ClamAV 0.91.2/4754/Mon Nov 12 06:03:00 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13635 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jpiszcz@lucidpixels.com Precedence: bulk X-list: xfs On Mon, 12 Nov 2007, David Chinner wrote: > On Mon, Nov 12, 2007 at 02:48:49PM +1100, xaiki@sgi.com wrote: >> From: Niv Sardi >> >> get the underlying structure with get_subvol_stripe_wrapper(), >> and pass sunit | swidth as an argument to calc_default_ag_geometry(). >> >> if it is set, get the AG sizes bigger. >> >> this also cleans up a typo: >> - } else if (daflag) /* User-specified AG size */ >> + } else if (daflag) /* User-specified AG count */ > > No need to mention that you are cleaning up a typo in the description ;) > >> --- >> xfsprogs/mkfs/xfs_mkfs.c | 18 ++++++++++++------ >> 1 files changed, 12 insertions(+), 6 deletions(-) >> >> diff --git a/xfsprogs/mkfs/xfs_mkfs.c b/xfsprogs/mkfs/xfs_mkfs.c >> index 78c2c77..4cf9975 100644 >> --- a/xfsprogs/mkfs/xfs_mkfs.c >> +++ b/xfsprogs/mkfs/xfs_mkfs.c >> @@ -393,6 +393,7 @@ void >> calc_default_ag_geometry( >> int blocklog, >> __uint64_t dblocks, >> + int multidisk, >> __uint64_t *agsize, >> __uint64_t *agcount) >> { >> @@ -428,12 +429,13 @@ calc_default_ag_geometry( >> * >> * This scales us up smoothly between min/max AG sizes. >> */ >> + >> if (dblocks > GIGABYTES(512, blocklog)) >> - shift = 5; >> + shift = 5 - (multidisk == 0); >> else if (dblocks > GIGABYTES(8, blocklog)) >> - shift = 4; >> + shift = 4 - (multidisk == 0); >> else if (dblocks >= MEGABYTES(128, blocklog)) >> - shift = 3; >> + shift = 3 - (multidisk == 0); >> else >> ASSERT(0); > > Ok, so now we end up with half the number of allocation groups > at these different sizes. That's not exactly what I had in mind. > basically, what you've done works out as: > > > 512GB old = 32 AGs, new = 16AGs > > 8 GB old = 16 AGs, new = 8AGs > > 128MB old = 8 AGs, new = 4AGs > > on an 8Gb filesystem we still get 8 AGs, which is far too many. > on a 750GB disk, we still get 16AGs, which to far too many. > > A single spindle, regardless of it's size, will have similar > seek characteristics so scaling the number of AGs with size > is the wrong thing to do - you don't get better parallelism > out of a single spindle, just more seeks and lower performance. > hence keeping the number of AGs fixed up to the point where > the AG size tops out (i.e. 4TB) seems like a better scaling > factor to me. i.e. something like: > > > if (!multidisk) { > if (dblocks >= TERABYTES(4, blocklog)) { > blocks = XFS_AG_MAX_BLOCKS(blocklog); > goto done; > } > agcount = 4; > /* work out ag size here */ > goto done; > } > > I'd also like to see some test results showing the mkfs output > for the different configurations to confirm it works correctly > (i.e. that the corner cases work correctly). > > Cheers, > > Dave. > -- > Dave Chinner > Principal Engineer > SGI Australian Software Group > > Dave, when this is put into place do you recommend people re-format their XFS partitions for those with a 750GiB drive -or- with a < 2TB RAID5 array, would one see any increase in speed? Justin. From owner-xfs@oss.sgi.com Mon Nov 12 07:21:25 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 12 Nov 2007 07:21:57 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from relay00.pair.com (relay00.pair.com [209.68.5.9]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lACFLKmj017458 for ; Mon, 12 Nov 2007 07:21:24 -0800 Received: (qmail 38278 invoked from network); 12 Nov 2007 14:54:43 -0000 Received: from unknown (HELO harpe.intellique.com) (unknown) by unknown with SMTP; 12 Nov 2007 14:54:43 -0000 X-pair-Authenticated: 77.104.28.114 Date: Mon, 12 Nov 2007 15:54:48 +0100 From: Emmanuel Florac To: Amandine AUPETIT Cc: xfs@oss.sgi.com Subject: Re: xfs options Message-ID: <20071112155448.26a2b479@harpe.intellique.com> In-Reply-To: <473844ED.7080605@jamendo.com> References: <473844ED.7080605@jamendo.com> Organization: Intellique X-Mailer: Claws Mail 3.0.2 (GTK+ 2.10.13; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Scanned: ClamAV 0.91.2/4754/Mon Nov 12 06:03:00 2007 on oss.sgi.com X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id lACFLQmj017466 X-archive-position: 13636 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: eflorac@intellique.com Precedence: bulk X-list: xfs Le Mon, 12 Nov 2007 13:19:57 +0100 Amandine AUPETIT écrivait: > I have a 7tb XFS filesystem. The partition is supposed to stock big > files between 2mo and 50mo. Quite no small files. > > So i'm trying to adapt the mount option to what I need, and what I > need is security. I want to be sure that NONE of the files will be > lost, even if there's a power failure. Unfortunately there isn't any such option. A power failure may be the cause of lost files on any filesystem. > For now, I just tried the noatime mount option, I know there is a lot > of options, but I don't really understand what they stand for. > > Do you have any suggestion to make my filesystem safer ? > Use redundant power supplies, and UPS; and make backups. Backups, backups, and backups. Keep a copy (with rsync or rsnapshot) on a similar server as your main backup, and a backup on tape. If your data is precious, JUST MAKE BACKUPS. -- ---------------------------------------- Emmanuel Florac | Intellique ---------------------------------------- From owner-xfs@oss.sgi.com Mon Nov 12 11:59:32 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 12 Nov 2007 11:59:44 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: **** X-Spam-Status: No, score=4.0 required=5.0 tests=BAYES_99 autolearn=no version=3.3.0-r574664 Received: from que01.charter.net (que01.charter.net [209.225.8.189]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lACJxQma019882; Mon, 12 Nov 2007 11:59:30 -0800 Received: from aarprv05.charter.net ([10.20.200.75]) by mtai01.charter.net (InterMail vM.7.08.02.02 201-2186-121-104-20070414) with ESMTP id <20071112163847.PQFJ13514.mtai01.charter.net@aarprv05.charter.net>; Mon, 12 Nov 2007 11:38:47 -0500 Received: from fepweb13 ([10.20.200.83]) by aarprv05.charter.net with ESMTP id <20071112163847.OGAQ27434.aarprv05.charter.net@fepweb13>; Mon, 12 Nov 2007 11:38:47 -0500 Message-ID: <20071112113847.A1454.176564.root@fepweb13> Date: Mon, 12 Nov 2007 11:38:47 -0500 From: VITTORIO FONDAZIONE Reply-To: emails_giovanni@yahoo.co.uk Subject: Donation/Grants MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) Sensitivity: Normal X-Originating-IP: X-Chzlrs: 0 To: undisclosed-recipients:; X-Virus-Scanned: ClamAV 0.91.2/4757/Mon Nov 12 09:20:27 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13637 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: gurisko@chartermi.net Precedence: bulk X-list: xfs VITTORIO FONDAZIONE. Corso Ercole I d'Este 44, Ferrara 44100 - Italy. IMPORTANT NOTICE Foundation's Officer, Fondazion Di Vittorio, ITALY http://www.fondazionedivittorio.it Attn:Winner Congratulations The Foundazion Di Vittorio has chosen you by the board of trustees as one of the final recipients of a cash Grant/Donation for your own personal, educational, and business development. To celebrate the 30th anniversary program, We are giving out a yearly donation of US$245,000.00 to 40 lucky recipients, as charity donations/aid from the Vittorio Foundation,ECOWAS,you are to fill the form and send it back to the claimagnent for verification, NO: (N-222-6747,E-900-56) FullName:............. Address:.......... Occupation:............. Country:......... Telephone:........... Sex:............. Age:.............. Executive Secretary- Mr Giovanni Bulucini Email: emails_giovanni@yahoo.co.uk http://www.fondazionedivittorio.it From owner-xfs@oss.sgi.com Mon Nov 12 12:12:23 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 12 Nov 2007 12:12:32 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: * X-Spam-Status: No, score=1.0 required=5.0 tests=AWL,BAYES_40,RDNS_DYNAMIC autolearn=no version=3.3.0-r574664 Received: from ty.sabi.co.UK (82-69-39-138.dsl.in-addr.zen.co.uk [82.69.39.138]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lACKCJgm021636 for ; Mon, 12 Nov 2007 12:12:22 -0800 Received: from from [127.0.0.1] (helo=base.ty.sabi.co.UK) by ty.sabi.co.UK with esmtp(Exim 4.66 #1) id 1Irfdk-0002j5-EH for ; Mon, 12 Nov 2007 20:12:12 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <18232.45975.580521.962971@base.ty.sabi.co.UK> Date: Mon, 12 Nov 2007 20:12:07 +0000 X-Face: SMJE]JPYVBO-9UR%/8d'mG.F!@.,l@c[f'[%S8'BZIcbQc3/">GrXDwb#;fTRGNmHr^JFb SAptvwWc,0+z+~p~"Gdr4H$(|N(yF(wwCM2bW0~U?HPEE^fkPGx^u[*[yV.gyB!hDOli}EF[\cW*S H&spRGFL}{`bj1TaD^l/"[ msn( /TH#THs{Hpj>)]f> Subject: Re: xfs options In-Reply-To: <473844ED.7080605@jamendo.com> References: <473844ED.7080605@jamendo.com> X-Mailer: VM 7.17 under 21.5 (beta28) XEmacs Lucid From: pg_xfs@xfs.for.sabi.co.UK (Peter Grandi) X-Disclaimer: This message contains only personal opinions X-Virus-Scanned: ClamAV 0.91.2/4757/Mon Nov 12 09:20:27 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13638 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: pg_xfs@xfs.for.sabi.co.UK Precedence: bulk X-list: xfs >>> On Mon, 12 Nov 2007 13:19:57 +0100, Amandine AUPETIT >>> said: amandine> Hi :) I have a 7tb XFS filesystem. [ ... ] We dearly hope that you are using a 64 bit kernel, BTW. amandine> So i'm trying to adapt the mount option to what I amandine> need, and what I need is security. I want to be sure amandine> that NONE of the files will be lost, even if there's a amandine> power failure. That's a very difficult goal, and one that requires quite a lot of effort and choosing and testing carefully which hardware components you use and carefully writing your applications. amandine> For now, I just tried the noatime mount option, I know amandine> there is a lot of options, but I don't really amandine> understand what they stand for. Then you are not yet in position to even judge whether the following points make sense :-). amandine> Do you have any suggestion to make my filesystem safer amandine> ? Making your *filesystem* safer seems a bit pointless to me, what seems to matter to you is "NONE of the files will be lost", and as to that I am assuming that you mean that even if the file has just been written. Then the most important and essential thing is to ensure that that whole storage chain (application, kernel, host adapter, disk drive) supports disabling (or flushing) their caches, and that every element of the chain returns and passes up reliable success or error indications. Achieving this is not at all simple; many manufacturers do not supply this information about their products, or they give it wrong, or anyhow the implementation is buggy no matter what the documentation says. Finding a storage chain that can reliably flush data from an application to the storage medium or return a reliable error indication if that fails is far from easy. Anyhow, in particular it is very important that the application writing the files avoid the use of caching (e.g. no use of 'stdio') or use software barriers (e.g. 'fflush' and 'fsync') at the right times (whuch can be just after every write). Then by far the safest way to setup things, if the above conditions have been met, is to disable write caching everywhere: * To disable caching in the host adapter/HBA, there is usually an option in the BIOS/fw. * To disable caching in the individual disks, use 'hdparm -W0' or equivalent. * To disable kernel caching, mount the filesystem with '-o sync'. This will make writes *much* slower (typically at least ten times slower), but it is safe (except that partial writes may still fail, and to prevent that battery backup for every element of the storage chain is needed). If the decrease in speed is too much, there are less safe but but still fairly good approaches based on using periodic write barriers instead of simply disabling caching. This requires ensuring that the whole storage chain support fine or coarse grained write barriers with proper error reporting, which again is far from trivial, and in addition there must be sufficient battery backup for every element of the storage chain in which data may be cached until the write barrier is exercised. This thread may be interesting: http://OSS.SGI.com/archives/xfs/2007-03/msg00157.html And perhaps these FAQs: http://OSS.SGI.com/projects/xfs/faq.html#wcache From owner-xfs@oss.sgi.com Mon Nov 12 12:31:57 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 12 Nov 2007 12:32:06 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lACKVqKU028467 for ; Mon, 12 Nov 2007 12:31:56 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id HAA23403; Tue, 13 Nov 2007 07:31:51 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lACKVndD103152813; Tue, 13 Nov 2007 07:31:50 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lACKVlvl103178120; Tue, 13 Nov 2007 07:31:47 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Tue, 13 Nov 2007 07:31:47 +1100 From: David Chinner To: Justin Piszcz Cc: xaiki@sgi.com, xfs@oss.sgi.com Subject: Re: [[PATCH, RESEND]] less AGs for single disks configs. Message-ID: <20071112203147.GA995458@sgi.com> References: <20071031233516.GB88034736@melbourne.sgi.com> <1194839329-22003-1-git-send-email-xaiki@sgi.com> <1194839329-22003-2-git-send-email-xaiki@sgi.com> <1194839329-22003-3-git-send-email-xaiki@sgi.com> <1194839329-22003-4-git-send-email-xaiki@sgi.com> <1194839329-22003-5-git-send-email-xaiki@sgi.com> <1194839329-22003-6-git-send-email-xaiki@sgi.com> <20071112090147.GD66820511@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4757/Mon Nov 12 09:20:27 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13639 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Mon, Nov 12, 2007 at 09:57:21AM -0500, Justin Piszcz wrote: > On Mon, 12 Nov 2007, David Chinner wrote: > >A single spindle, regardless of it's size, will have similar > >seek characteristics so scaling the number of AGs with size > >is the wrong thing to do - you don't get better parallelism > >out of a single spindle, just more seeks and lower performance. > >hence keeping the number of AGs fixed up to the point where > >the AG size tops out (i.e. 4TB) seems like a better scaling > >factor to me. i.e. something like: > > > > > > if (!multidisk) { > > if (dblocks >= TERABYTES(4, blocklog)) { > > blocks = XFS_AG_MAX_BLOCKS(blocklog); > > goto done; > > } > > agcount = 4; > > /* work out ag size here */ > > goto done; > > } > > > >I'd also like to see some test results showing the mkfs output > >for the different configurations to confirm it works correctly > >(i.e. that the corner cases work correctly). > > Dave, when this is put into place do you recommend people re-format their > XFS partitions for those with a 750GiB drive -or- with a < 2TB RAID5 > array, No. If you are having performance problems, then changing the way the filesystem is laid out *may* improve performance but if everything is working fine then don't change it. > would one see any increase in speed? On a single disk, yes. On RAID5 - who knows. There are so many other variables to raid5 performance (esp software raid) that such single disk optimisations could degrade performance. On other RAID hardware, it might improve - it really depends on the RAID implementation.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Mon Nov 12 16:51:03 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 12 Nov 2007 16:51:06 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.2 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAD0ox7J026670 for ; Mon, 12 Nov 2007 16:51:02 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA00359; Tue, 13 Nov 2007 11:51:02 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAD0p1dD104054831; Tue, 13 Nov 2007 11:51:01 +1100 (AEDT) Received: (from xaiki@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAD0p0Y8103929630; Tue, 13 Nov 2007 11:51:00 +1100 (AEDT) Date: Tue, 13 Nov 2007 11:51:00 +1100 From: Niv Sardi To: David Chinner Cc: xfs@oss.sgi.com Subject: Re: [[PATCH, RESEND]] V2 inodes per default, and move DFL bits to XFS_DFL_SB_VERSION_BITS, Message-ID: <20071113005059.GA101270606@melbourne.sgi.com> References: <20071031233516.GB88034736@melbourne.sgi.com> <1194839329-22003-1-git-send-email-xaiki@sgi.com> <1194839329-22003-2-git-send-email-xaiki@sgi.com> <1194839329-22003-3-git-send-email-xaiki@sgi.com> <1194839329-22003-4-git-send-email-xaiki@sgi.com> <20071112063136.GB66820511@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071112063136.GB66820511@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4757/Mon Nov 12 09:20:27 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13640 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: xaiki@sgi.com Precedence: bulk X-list: xfs * David Chinner [2007-11-12 17:31:36 +1100]: > On Mon, Nov 12, 2007 at 02:48:47PM +1100, xaiki@sgi.com wrote: > I suggest splitting this over multiple lines and calling it > XFS_DEFAULT_SB_VERSION_BITS. The rest of the file labels defaults as DFL, I feel it's better to keep it that way. -- Niv From owner-xfs@oss.sgi.com Mon Nov 12 20:11:12 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 12 Nov 2007 20:11:25 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from itchy (dhcp17.melbourne.sgi.com [134.14.55.17]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAD4B9aZ012368 for ; Mon, 12 Nov 2007 20:11:11 -0800 Received: by itchy (Postfix, from userid 16403) id A25F5BB05; Tue, 13 Nov 2007 15:10:57 +1100 (EST) From: xaiki@sgi.com To: xfs@oss.sgi.com Cc: Niv Sardi Subject: [PATCH TAKE 2 1/6] Default to log version 2 Date: Tue, 13 Nov 2007 15:10:52 +1100 Message-Id: <1194927057-26415-2-git-send-email-xaiki@sgi.com> X-Mailer: git-send-email 1.5.3.5 In-Reply-To: <1194927057-26415-1-git-send-email-xaiki@sgi.com> References: <20071029075657.GA84369978@melbourne.sgi.com> <1194927057-26415-1-git-send-email-xaiki@sgi.com> X-Virus-Scanned: ClamAV 0.91.2/4757/Mon Nov 12 09:20:27 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13645 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: xaiki@sgi.com Precedence: bulk X-list: xfs From: Niv Sardi Change logversion to 2 in xfs_mkfs.c --- xfsprogs/mkfs/xfs_mkfs.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/xfsprogs/mkfs/xfs_mkfs.c b/xfsprogs/mkfs/xfs_mkfs.c index 6e84a4e..5f3299d 100644 --- a/xfsprogs/mkfs/xfs_mkfs.c +++ b/xfsprogs/mkfs/xfs_mkfs.c @@ -686,7 +686,7 @@ main( ilflag = imflag = ipflag = isflag = 0; liflag = laflag = lsflag = ldflag = lvflag = 0; loginternal = 1; - logversion = 1; + logversion = 2; logagno = logblocks = rtblocks = rtextblocks = 0; Nflag = nlflag = nsflag = nvflag = 0; dirblocklog = dirblocksize = dirversion = 0; -- 1.5.3.5 From owner-xfs@oss.sgi.com Mon Nov 12 20:11:13 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 12 Nov 2007 20:11:26 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from itchy (dhcp17.melbourne.sgi.com [134.14.55.17]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAD4B93Y012369 for ; Mon, 12 Nov 2007 20:11:11 -0800 Received: by itchy (Postfix, from userid 16403) id AD543BB07; Tue, 13 Nov 2007 15:10:57 +1100 (EST) From: xaiki@sgi.com To: xfs@oss.sgi.com Cc: Niv Sardi Subject: [PATCH TAKE 2 6/6] less AGs for single disks configs. Date: Tue, 13 Nov 2007 15:10:57 +1100 Message-Id: <1194927057-26415-7-git-send-email-xaiki@sgi.com> X-Mailer: git-send-email 1.5.3.5 In-Reply-To: <1194927057-26415-6-git-send-email-xaiki@sgi.com> References: <20071029075657.GA84369978@melbourne.sgi.com> <1194927057-26415-1-git-send-email-xaiki@sgi.com> <1194927057-26415-2-git-send-email-xaiki@sgi.com> <1194927057-26415-3-git-send-email-xaiki@sgi.com> <1194927057-26415-4-git-send-email-xaiki@sgi.com> <1194927057-26415-5-git-send-email-xaiki@sgi.com> <1194927057-26415-6-git-send-email-xaiki@sgi.com> X-Virus-Scanned: ClamAV 0.91.2/4757/Mon Nov 12 09:20:27 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13647 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: xaiki@sgi.com Precedence: bulk X-list: xfs From: Niv Sardi get the underlying structure with get_subvol_stripe_wrapper(), and pass sunit | swidth as an argument to calc_default_ag_geometry(). if it is set, we are in single disk, get XFS_AG_MAX_BLOCKS for FS >= 4TB, and calculate ag numbers regarding to that. get 4 AGs for FS < 4TB. we calculate according to blocks or count if we have them, add an assert to ensure we have one of the 2. --- xfsprogs/mkfs/xfs_mkfs.c | 28 ++++++++++++++++++++++++---- 1 files changed, 24 insertions(+), 4 deletions(-) diff --git a/xfsprogs/mkfs/xfs_mkfs.c b/xfsprogs/mkfs/xfs_mkfs.c index a8af6e2..e15d667 100644 --- a/xfsprogs/mkfs/xfs_mkfs.c +++ b/xfsprogs/mkfs/xfs_mkfs.c @@ -401,10 +401,11 @@ void calc_default_ag_geometry( int blocklog, __uint64_t dblocks, + int multidisk, __uint64_t *agsize, __uint64_t *agcount) { - __uint64_t blocks; + __uint64_t blocks = 0; __uint64_t count = 0; int shift = 0; @@ -436,6 +437,17 @@ calc_default_ag_geometry( * * This scales us up smoothly between min/max AG sizes. */ + + if (!multidisk) { + if (dblocks >= TERABYTES(4, blocklog)) { + blocks = XFS_AG_MAX_BLOCKS(blocklog); + goto done; + } + count = 4; + + goto done; + } + if (dblocks > GIGABYTES(512, blocklog)) shift = 5; else if (dblocks > GIGABYTES(8, blocklog)) @@ -447,8 +459,12 @@ calc_default_ag_geometry( blocks = dblocks >> shift; done: + ASSERT (count || blocks); if (!count) count = dblocks / blocks + (dblocks % blocks != 0); + if (!blocks) + blocks = dblocks / count; + *agsize = blocks; *agcount = count; } @@ -1779,10 +1795,14 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"), agsize /= blocksize; agcount = dblocks / agsize + (dblocks % agsize != 0); - } else if (daflag) /* User-specified AG size */ + } else if (daflag) /* User-specified AG count */ agsize = dblocks / agcount + (dblocks % agcount != 0); - else - calc_default_ag_geometry(blocklog, dblocks, &agsize, &agcount); + else { + get_subvol_stripe_wrapper(dfile, SVTYPE_DATA, + &xlv_dsunit, &xlv_dswidth, §oralign), + calc_default_ag_geometry(blocklog, dblocks, xlv_dsunit | xlv_dswidth, + &agsize, &agcount); + } /* * If the last AG is too small, reduce the filesystem size -- 1.5.3.5 From owner-xfs@oss.sgi.com Mon Nov 12 20:11:07 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 12 Nov 2007 20:11:25 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_34, J_CHICKENPOX_35,J_CHICKENPOX_36,J_CHICKENPOX_37,J_CHICKENPOX_38, J_CHICKENPOX_39,J_CHICKENPOX_43,J_CHICKENPOX_47,J_CHICKENPOX_51, J_CHICKENPOX_71,J_CHICKENPOX_75 autolearn=no version=3.3.0-r574664 Received: from itchy (dhcp17.melbourne.sgi.com [134.14.55.17]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAD4B2KM012299 for ; Mon, 12 Nov 2007 20:11:05 -0800 Received: by itchy (Postfix, from userid 16403) id 9F76FBB03; Tue, 13 Nov 2007 15:10:57 +1100 (EST) From: xaiki@sgi.com To: xfs@oss.sgi.com Cc: Niv Sardi Subject: [PATCH TAKE 2 3/6] Drop the ability to turn unwritten extents off completly Date: Tue, 13 Nov 2007 15:10:54 +1100 Message-Id: <1194927057-26415-4-git-send-email-xaiki@sgi.com> X-Mailer: git-send-email 1.5.3.5 In-Reply-To: <1194927057-26415-3-git-send-email-xaiki@sgi.com> References: <20071029075657.GA84369978@melbourne.sgi.com> <1194927057-26415-1-git-send-email-xaiki@sgi.com> <1194927057-26415-2-git-send-email-xaiki@sgi.com> <1194927057-26415-3-git-send-email-xaiki@sgi.com> X-Virus-Scanned: ClamAV 0.91.2/4757/Mon Nov 12 09:20:27 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13644 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: xaiki@sgi.com Precedence: bulk X-list: xfs From: Niv Sardi unwritten extents on linux are generally a bad idea, this option should not be used. Remove the mount option from xfs_mkfs.c: remove it from option list, remove it from mkfs output. Update xfs.mkfs manpage. --- xfsprogs/doc/CHANGES | 1 + xfsprogs/growfs/xfs_growfs.c | 11 ++++------- xfsprogs/man/man8/mkfs.xfs.8 | 16 ---------------- xfsprogs/man/man8/xfs_admin.8 | 3 ++- xfsprogs/mkfs/xfs_mkfs.c | 38 +++++++++++++------------------------- xfsprogs/mkfs/xfs_mkfs.h | 6 +++--- 6 files changed, 23 insertions(+), 52 deletions(-) diff --git a/xfsprogs/doc/CHANGES b/xfsprogs/doc/CHANGES index 1858a87..5a3e165 100644 --- a/xfsprogs/doc/CHANGES +++ b/xfsprogs/doc/CHANGES @@ -5,6 +5,7 @@ xfsprogs-2.9.x warning in certain device sizes. - Man page fixes. Thanks to Utako Kusaka for this. + - Disable the ability to turn off unwritten extents in mkfs. xfsprogs-2.9.4 (7 Sep 2007) - Fixed xfs_repair segfaulting with directory block size different diff --git a/xfsprogs/growfs/xfs_growfs.c b/xfsprogs/growfs/xfs_growfs.c index b029e1b..5767f10 100644 --- a/xfsprogs/growfs/xfs_growfs.c +++ b/xfsprogs/growfs/xfs_growfs.c @@ -58,7 +58,6 @@ report_info( int isint, char *logname, char *rtname, - int unwritten, int lazycount, int dirversion, int logversion, @@ -68,7 +67,7 @@ report_info( "meta-data=%-22s isize=%-6u agcount=%u, agsize=%u blks\n" " =%-22s sectsz=%-5u attr=%u\n" "data =%-22s bsize=%-6u blocks=%llu, imaxpct=%u\n" - " =%-22s sunit=%-6u swidth=%u blks, unwritten=%u\n" + " =%-22s sunit=%-6u swidth=%u blks" "naming =version %-14u bsize=%-6u\n" "log =%-22s bsize=%-6u blocks=%u, version=%u\n" " =%-22s sectsz=%-5u sunit=%u blks, lazy-count=%u\n" @@ -78,7 +77,7 @@ report_info( "", geo.sectsize, attrversion, "", geo.blocksize, (unsigned long long)geo.datablocks, geo.imaxpct, - "", geo.sunit, geo.swidth, unwritten, + "", geo.sunit, geo.swidth, dirversion, geo.dirblocksize, isint ? _("internal") : logname ? logname : _("external"), geo.blocksize, geo.logblocks, logversion, @@ -115,7 +114,6 @@ main(int argc, char **argv) xfs_fsop_geom_t ngeo; /* new fs geometry */ int rflag; /* -r flag */ long long rsize; /* new rt size in fs blocks */ - int unwritten; /* unwritten extent flag */ int lazycount; /* lazy superblock counters */ int xflag; /* -x flag */ char *fname; /* mount point name */ @@ -236,7 +234,6 @@ main(int argc, char **argv) } } isint = geo.logstart > 0; - unwritten = geo.flags & XFS_FSOP_GEOM_FLAGS_EXTFLG ? 1 : 0; lazycount = geo.flags & XFS_FSOP_GEOM_FLAGS_LAZYSB ? 1 : 0; dirversion = geo.flags & XFS_FSOP_GEOM_FLAGS_DIRV2 ? 2 : 1; logversion = geo.flags & XFS_FSOP_GEOM_FLAGS_LOGV2 ? 2 : 1; @@ -245,7 +242,7 @@ main(int argc, char **argv) if (nflag) { report_info(geo, datadev, isint, logdev, rtdev, - unwritten, lazycount, dirversion, logversion, + lazycount, dirversion, logversion, attrversion); exit(0); } @@ -282,7 +279,7 @@ main(int argc, char **argv) } report_info(geo, datadev, isint, logdev, rtdev, - unwritten, lazycount, dirversion, logversion, + lazycount, dirversion, logversion, attrversion); ddsize = xi.dsize; diff --git a/xfsprogs/man/man8/mkfs.xfs.8 b/xfsprogs/man/man8/mkfs.xfs.8 index 0d27901..b6024c3 100644 --- a/xfsprogs/man/man8/mkfs.xfs.8 +++ b/xfsprogs/man/man8/mkfs.xfs.8 @@ -240,22 +240,6 @@ will automatically query the logical volume for appropriate and .B swidth values. -.TP -.BI unwritten[= value ] -This is used to specify whether unwritten extents are flagged as such, -or not. -The -.I value -is either 0 or 1, with 1 signifying that unwritten -extent flagging should occur. -If the suboption is omitted, unwritten extent flagging is enabled. -If unwritten extents are flagged, filesystem write performance -will be negatively affected for preallocated file extents, since -extra filesystem transactions are required to convert extent flags -for the range of the file written. -This suboption should be disabled if the filesystem -needs to be used on operating system versions which do not support the -flagging capability. .RE .TP .B \-f diff --git a/xfsprogs/man/man8/xfs_admin.8 b/xfsprogs/man/man8/xfs_admin.8 index c0017b9..c38a942 100644 --- a/xfsprogs/man/man8/xfs_admin.8 +++ b/xfsprogs/man/man8/xfs_admin.8 @@ -31,7 +31,8 @@ command. .TP .B \-e Enables unwritten extent support on a filesystem that does not -already have this enabled. +already have this enabled (for legacy filesystems, it can't be +disabled anymore at mkfs time). .TP .B \-f Specifies that the filesystem image to be processed is stored in a diff --git a/xfsprogs/mkfs/xfs_mkfs.c b/xfsprogs/mkfs/xfs_mkfs.c index b378800..3689eb7 100644 --- a/xfsprogs/mkfs/xfs_mkfs.c +++ b/xfsprogs/mkfs/xfs_mkfs.c @@ -56,25 +56,23 @@ char *dopts[] = { "sunit", #define D_SWIDTH 5 "swidth", -#define D_UNWRITTEN 6 - "unwritten", -#define D_AGSIZE 7 +#define D_AGSIZE 6 "agsize", -#define D_SU 8 +#define D_SU 7 "su", -#define D_SW 9 +#define D_SW 8 "sw", -#define D_SECTLOG 10 +#define D_SECTLOG 9 "sectlog", -#define D_SECTSIZE 11 +#define D_SECTSIZE 10 "sectsize", -#define D_NOALIGN 12 +#define D_NOALIGN 11 "noalign", -#define D_RTINHERIT 13 +#define D_RTINHERIT 12 "rtinherit", -#define D_PROJINHERIT 14 +#define D_PROJINHERIT 13 "projinherit", -#define D_EXTSZINHERIT 15 +#define D_EXTSZINHERIT 14 "extszinherit", NULL }; @@ -604,7 +602,6 @@ main( int dsw; int dsunit; int dswidth; - int extent_flagging; int force_overwrite; struct fsxattr fsx; int iaflag; @@ -697,7 +694,6 @@ main( dsize = logsize = rtsize = rtextsize = protofile = NULL; dsu = dsw = dsunit = dswidth = lalign = lsu = lsunit = 0; nodsflag = norsflag = 0; - extent_flagging = 1; force_overwrite = 0; worst_freelist = 0; lazy_sb_counters = 0; @@ -877,14 +873,6 @@ main( D_NOALIGN); nodsflag = 1; break; - case D_UNWRITTEN: - if (!value) - reqval('d', dopts, D_UNWRITTEN); - c = atoi(value); - if (c < 0 || c > 1) - illegal(value, "d unwritten"); - extent_flagging = c; - break; case D_SECTLOG: if (!value) reqval('d', dopts, D_SECTLOG); @@ -1990,7 +1978,7 @@ an AG size that is one stripe unit smaller, for example %llu.\n"), "meta-data=%-22s isize=%-6d agcount=%lld, agsize=%lld blks\n" " =%-22s sectsz=%-5u attr=%u\n" "data =%-22s bsize=%-6u blocks=%llu, imaxpct=%u\n" - " =%-22s sunit=%-6u swidth=%u blks, unwritten=%u\n" + " =%-22s sunit=%-6u swidth=%u blks\n" "naming =version %-14u bsize=%-6u\n" "log =%-22s bsize=%-6d blocks=%lld, version=%d\n" " =%-22s sectsz=%-5u sunit=%d blks, lazy-count=%d\n" @@ -1999,7 +1987,7 @@ an AG size that is one stripe unit smaller, for example %llu.\n"), "", sectorsize, attrversion, "", blocksize, (long long)dblocks, imflag ? imaxpct : XFS_DFL_IMAXIMUM_PCT, - "", dsunit, dswidth, extent_flagging, + "", dsunit, dswidth, dirversion, dirversion == 1 ? blocksize : dirblocksize, logfile, 1 << blocklog, (long long)logblocks, logversion, "", lsectorsize, lsunit, lazy_sb_counters, @@ -2066,7 +2054,7 @@ an AG size that is one stripe unit smaller, for example %llu.\n"), } sbp->sb_features2 = XFS_SB_VERSION2_MKFS(lazy_sb_counters, attrversion == 2, 0); sbp->sb_versionnum = XFS_SB_VERSION_MKFS( - iaflag, dsunit != 0, extent_flagging, + iaflag, dsunit != 0, dirversion == 2, logversion == 2, attrversion == 1, (sectorsize != BBSIZE || lsectorsize != BBSIZE), sbp->sb_features2 != 0); @@ -2537,7 +2525,7 @@ usage( void ) /* blocksize */ [-b log=n|size=num]\n\ /* data subvol */ [-d agcount=n,agsize=n,file,name=xxx,size=num,\n\ (sunit=value,swidth=value|su=num,sw=num),\n\ - sectlog=n|sectsize=num,unwritten=0|1]\n\ + sectlog=n|sectsize=num\n\ /* inode size */ [-i log=n|perblock=n|size=num,maxpct=n,attr=0|1|2]\n\ /* log subvol */ [-l agnum=n,internal,size=num,logdev=xxx,version=n\n\ sunit=value|su=num,sectlog=n|sectsize=num,\n\ diff --git a/xfsprogs/mkfs/xfs_mkfs.h b/xfsprogs/mkfs/xfs_mkfs.h index 1ab85fd..f19f917 100644 --- a/xfsprogs/mkfs/xfs_mkfs.h +++ b/xfsprogs/mkfs/xfs_mkfs.h @@ -18,12 +18,12 @@ #ifndef __XFS_MKFS_H__ #define __XFS_MKFS_H__ -#define XFS_SB_VERSION_MKFS(ia,dia,extflag,dir2,log2,attr1,sflag,more) (\ - ((ia)||(dia)||(extflag)||(dir2)||(log2)||(attr1)||(sflag)||(more)) ? \ +#define XFS_SB_VERSION_MKFS(ia,dia,dir2,log2,attr1,sflag,more) (\ + ((ia)||(dia)||(dir2)||(log2)||(attr1)||(sflag)||(more)) ? \ ( XFS_SB_VERSION_4 | \ ((ia) ? XFS_SB_VERSION_ALIGNBIT : 0) | \ ((dia) ? XFS_SB_VERSION_DALIGNBIT : 0) | \ - ((extflag) ? XFS_SB_VERSION_EXTFLGBIT : 0) | \ + (XFS_SB_VERSION_EXTFLGBIT) | \ ((dir2) ? XFS_SB_VERSION_DIRV2BIT : 0) | \ ((log2) ? XFS_SB_VERSION_LOGV2BIT : 0) | \ ((attr1) ? XFS_SB_VERSION_ATTRBIT : 0) | \ -- 1.5.3.5 From owner-xfs@oss.sgi.com Mon Nov 12 20:11:06 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 12 Nov 2007 20:11:18 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.2 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from itchy (dhcp17.melbourne.sgi.com [134.14.55.17]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAD4B2Yp012296 for ; Mon, 12 Nov 2007 20:11:05 -0800 Received: by itchy (Postfix, from userid 16403) id 79846BB01; Tue, 13 Nov 2007 15:10:57 +1100 (EST) From: xaiki@sgi.com To: xfs@oss.sgi.com Subject: RESEND(2) Date: Tue, 13 Nov 2007 15:10:51 +1100 Message-Id: <1194927057-26415-1-git-send-email-xaiki@sgi.com> X-Mailer: git-send-email 1.5.3.5 In-Reply-To: <20071029075657.GA84369978@melbourne.sgi.com> References: <20071029075657.GA84369978@melbourne.sgi.com> X-Virus-Scanned: ClamAV 0.91.2/4757/Mon Nov 12 09:20:27 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13643 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: xaiki@sgi.com Precedence: bulk X-list: xfs Added descriptions to first patches, Added extended description for unwritten extents, updated manpages. Refactored XFS_DFL_SB_VERSION_BITS as suggested by dave. Added in-code description for reduced imaxpct for big filesystems. made calc_default_imaxpct stati, as suggested by dave. Removed mention to cleaning typos ;) implemented new logics as suggested by dave. From owner-xfs@oss.sgi.com Mon Nov 12 20:11:05 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 12 Nov 2007 20:11:15 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.2 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from itchy (dhcp17.melbourne.sgi.com [134.14.55.17]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAD4B2oa012297 for ; Mon, 12 Nov 2007 20:11:04 -0800 Received: by itchy (Postfix, from userid 16403) id 997DEBAFE; Tue, 13 Nov 2007 15:10:57 +1100 (EST) From: xaiki@sgi.com To: xfs@oss.sgi.com Cc: Niv Sardi Subject: [PATCH TAKE 2 2/6] Default to version 2 attributes. Date: Tue, 13 Nov 2007 15:10:53 +1100 Message-Id: <1194927057-26415-3-git-send-email-xaiki@sgi.com> X-Mailer: git-send-email 1.5.3.5 In-Reply-To: <1194927057-26415-2-git-send-email-xaiki@sgi.com> References: <20071029075657.GA84369978@melbourne.sgi.com> <1194927057-26415-1-git-send-email-xaiki@sgi.com> <1194927057-26415-2-git-send-email-xaiki@sgi.com> X-Virus-Scanned: ClamAV 0.91.2/4757/Mon Nov 12 09:20:27 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13642 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: xaiki@sgi.com Precedence: bulk X-list: xfs From: Niv Sardi Change attrversion from 0 to 2 in xfs_mkfs.c --- xfsprogs/mkfs/xfs_mkfs.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/xfsprogs/mkfs/xfs_mkfs.c b/xfsprogs/mkfs/xfs_mkfs.c index 5f3299d..b378800 100644 --- a/xfsprogs/mkfs/xfs_mkfs.c +++ b/xfsprogs/mkfs/xfs_mkfs.c @@ -677,7 +677,7 @@ main( bindtextdomain(PACKAGE, LOCALEDIR); textdomain(PACKAGE); - attrversion = 0; + attrversion = 2; blflag = bsflag = slflag = ssflag = lslflag = lssflag = 0; blocklog = blocksize = 0; sectorlog = lsectorlog = XFS_MIN_SECTORSIZE_LOG; -- 1.5.3.5 From owner-xfs@oss.sgi.com Mon Nov 12 20:11:05 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 12 Nov 2007 20:11:11 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_46, J_CHICKENPOX_47 autolearn=no version=3.3.0-r574664 Received: from itchy (dhcp17.melbourne.sgi.com [134.14.55.17]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAD4B2ol012298 for ; Mon, 12 Nov 2007 20:11:04 -0800 Received: by itchy (Postfix, from userid 16403) id A9353BB06; Tue, 13 Nov 2007 15:10:57 +1100 (EST) From: xaiki@sgi.com To: xfs@oss.sgi.com Cc: Niv Sardi Subject: [PATCH TAKE 2 5/6] reduce imaxpct for big filesystems, Date: Tue, 13 Nov 2007 15:10:56 +1100 Message-Id: <1194927057-26415-6-git-send-email-xaiki@sgi.com> X-Mailer: git-send-email 1.5.3.5 In-Reply-To: <1194927057-26415-5-git-send-email-xaiki@sgi.com> References: <20071029075657.GA84369978@melbourne.sgi.com> <1194927057-26415-1-git-send-email-xaiki@sgi.com> <1194927057-26415-2-git-send-email-xaiki@sgi.com> <1194927057-26415-3-git-send-email-xaiki@sgi.com> <1194927057-26415-4-git-send-email-xaiki@sgi.com> <1194927057-26415-5-git-send-email-xaiki@sgi.com> X-Virus-Scanned: ClamAV 0.91.2/4757/Mon Nov 12 09:20:27 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13641 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: xaiki@sgi.com Precedence: bulk X-list: xfs From: Niv Sardi imaxpct is set to 25% (XFS_DFL_IMAXIMUM_PCT) for FS < 1 TB, then 5% for FS < 50 TB, and then (over 50 TB) 1%. It is implemented as a simple step function in calc_default_imaxpct() --- xfsprogs/mkfs/xfs_mkfs.c | 27 +++++++++++++++++++++++++-- 1 files changed, 25 insertions(+), 2 deletions(-) diff --git a/xfsprogs/mkfs/xfs_mkfs.c b/xfsprogs/mkfs/xfs_mkfs.c index 3689eb7..a8af6e2 100644 --- a/xfsprogs/mkfs/xfs_mkfs.c +++ b/xfsprogs/mkfs/xfs_mkfs.c @@ -374,6 +374,29 @@ validate_log_size(__uint64_t logblocks, int blocklog, int min_logblocks) } } +static int +calc_default_imaxpct( + int blocklog, + __uint64_t dblocks) +{ + /* + * This returns the % of the disk space that is used for + * inodes, it changes relatively to the FS size: + * - over 50 TB, use 1%, + * - 1TB - 50 TB, use 5%, + * - under 1 TB, use XFS_DFL_IMAXIMUM_PCT (25%). + */ + + if (dblocks < TERABYTES(1, blocklog)) { + return XFS_DFL_IMAXIMUM_PCT; + } else if (dblocks < TERABYTES(50, blocklog)) { + return 5; + } + + return 1; +} + + void calc_default_ag_geometry( int blocklog, @@ -1986,7 +2009,7 @@ an AG size that is one stripe unit smaller, for example %llu.\n"), dfile, isize, (long long)agcount, (long long)agsize, "", sectorsize, attrversion, "", blocksize, (long long)dblocks, - imflag ? imaxpct : XFS_DFL_IMAXIMUM_PCT, + calc_default_imaxpct(blocklog, dblocks), "", dsunit, dswidth, dirversion, dirversion == 1 ? blocksize : dirblocksize, logfile, 1 << blocklog, (long long)logblocks, @@ -2023,7 +2046,7 @@ an AG size that is one stripe unit smaller, for example %llu.\n"), (__uint8_t)(rtextents ? libxfs_highbit32((unsigned int)rtextents) : 0); sbp->sb_inprogress = 1; /* mkfs is in progress */ - sbp->sb_imax_pct = imflag ? imaxpct : XFS_DFL_IMAXIMUM_PCT; + sbp->sb_imax_pct = calc_default_imaxpct(blocklog, dblocks); sbp->sb_icount = 0; sbp->sb_ifree = 0; sbp->sb_fdblocks = dblocks - agcount * XFS_PREALLOC_BLOCKS(mp) - -- 1.5.3.5 From owner-xfs@oss.sgi.com Mon Nov 12 20:11:07 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 12 Nov 2007 20:11:26 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from itchy (dhcp17.melbourne.sgi.com [134.14.55.17]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAD4B2T5012300 for ; Mon, 12 Nov 2007 20:11:04 -0800 Received: by itchy (Postfix, from userid 16403) id A64DABB02; Tue, 13 Nov 2007 15:10:57 +1100 (EST) From: xaiki@sgi.com To: xfs@oss.sgi.com Cc: Niv Sardi Subject: [PATCH TAKE 2 4/6] V2 inodes per default, and move DFL bits to XFS_DFL_SB_VERSION_BITS, Date: Tue, 13 Nov 2007 15:10:55 +1100 Message-Id: <1194927057-26415-5-git-send-email-xaiki@sgi.com> X-Mailer: git-send-email 1.5.3.5 In-Reply-To: <1194927057-26415-4-git-send-email-xaiki@sgi.com> References: <20071029075657.GA84369978@melbourne.sgi.com> <1194927057-26415-1-git-send-email-xaiki@sgi.com> <1194927057-26415-2-git-send-email-xaiki@sgi.com> <1194927057-26415-3-git-send-email-xaiki@sgi.com> <1194927057-26415-4-git-send-email-xaiki@sgi.com> X-Virus-Scanned: ClamAV 0.91.2/4757/Mon Nov 12 09:20:27 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13646 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: xaiki@sgi.com Precedence: bulk X-list: xfs From: Niv Sardi Activate XFS_SB_VERSION_NLINKBIT per default, which will enable V2 INODES. refactor bits that we want everytime in XFS_DFL_SB_VERSION_BITS. --- xfsprogs/mkfs/xfs_mkfs.h | 6 +++++- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/xfsprogs/mkfs/xfs_mkfs.h b/xfsprogs/mkfs/xfs_mkfs.h index f19f917..5cc841c 100644 --- a/xfsprogs/mkfs/xfs_mkfs.h +++ b/xfsprogs/mkfs/xfs_mkfs.h @@ -18,17 +18,21 @@ #ifndef __XFS_MKFS_H__ #define __XFS_MKFS_H__ +#define XFS_DFL_SB_VERSION_BITS \ + (XFS_SB_VERSION_NLINKBIT | \ + XFS_SB_VERSION_EXTFLGBIT) + #define XFS_SB_VERSION_MKFS(ia,dia,dir2,log2,attr1,sflag,more) (\ ((ia)||(dia)||(dir2)||(log2)||(attr1)||(sflag)||(more)) ? \ ( XFS_SB_VERSION_4 | \ ((ia) ? XFS_SB_VERSION_ALIGNBIT : 0) | \ ((dia) ? XFS_SB_VERSION_DALIGNBIT : 0) | \ - (XFS_SB_VERSION_EXTFLGBIT) | \ ((dir2) ? XFS_SB_VERSION_DIRV2BIT : 0) | \ ((log2) ? XFS_SB_VERSION_LOGV2BIT : 0) | \ ((attr1) ? XFS_SB_VERSION_ATTRBIT : 0) | \ ((sflag) ? XFS_SB_VERSION_SECTORBIT : 0) | \ ((more) ? XFS_SB_VERSION_MOREBITSBIT : 0) | \ + XFS_DFL_SB_VERSION_BITS | \ 0 ) : XFS_SB_VERSION_1 ) #define XFS_SB_VERSION2_MKFS(lazycount, attr2, parent) (\ -- 1.5.3.5 From owner-xfs@oss.sgi.com Mon Nov 12 20:47:32 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 12 Nov 2007 20:47:44 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAD4lR51022635 for ; Mon, 12 Nov 2007 20:47:31 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA05222; Tue, 13 Nov 2007 15:47:29 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAD4lSdD102789330; Tue, 13 Nov 2007 15:47:29 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAD4lRaa104144005; Tue, 13 Nov 2007 15:47:27 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Tue, 13 Nov 2007 15:47:27 +1100 From: David Chinner To: xaiki@sgi.com Cc: xfs@oss.sgi.com Subject: Re: RESEND(2) Message-ID: <20071113044727.GK995458@sgi.com> References: <20071029075657.GA84369978@melbourne.sgi.com> <1194927057-26415-1-git-send-email-xaiki@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1194927057-26415-1-git-send-email-xaiki@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4757/Mon Nov 12 09:20:27 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13648 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Tue, Nov 13, 2007 at 03:10:51PM +1100, xaiki@sgi.com wrote: > > Added descriptions to first patches, > Added extended description for unwritten extents, updated manpages. > Refactored XFS_DFL_SB_VERSION_BITS as suggested by dave. > Added in-code description for reduced imaxpct for big filesystems. > made calc_default_imaxpct stati, as suggested by dave. > Removed mention to cleaning typos ;) > implemented new logics as suggested by dave. All looks ok now, Niv. Good work. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Tue Nov 13 07:58:56 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 13 Nov 2007 07:59:03 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=0.4 required=5.0 tests=BAYES_00,HTML_MESSAGE autolearn=no version=3.3.0-r574664 Received: from rn-out-0102.google.com (rn-out-0910.google.com [64.233.170.190]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lADFwtCx024225 for ; Tue, 13 Nov 2007 07:58:56 -0800 Received: by rn-out-0102.google.com with SMTP id a43so968368rne for ; Tue, 13 Nov 2007 07:59:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type; bh=gNfmIN5Z1WlkdlJRKHwI5q0Gv9Y0pyk5SUkvtK7fd/I=; b=DdbFNUb/6azDW3AM1lDv4f8flFjX/ZZpuOrxL/EG/l7qwaootXcVWxvsOk2nJf8Ng66pa88YbmMcmYTgfnNDuA/Uaq811p6GuFzhZq3vxqXyWxBZXdbwpyQOf72YBoZyNw5FS4N2K/pna90TEI9po986Eq3aJtuZHmuXURUnJ1k= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:mime-version:content-type; b=f1libSnPdl4icQUjFYd1URvlyo1v7dm/jc0dZ4/EpR4BZLpKLLqrVllKtvczVCYLOIlUpu9iE58wh/DalDCU0Az1W+MHQiz72o2EHofuYOsb2wzk8GZKEx0NhmWEgR7yu5dW1OU66xrdORl5fgzB2eJFTYIaG2d5sfSAhWFUfXs= Received: by 10.142.104.9 with SMTP id b9mr132769wfc.1194965828693; Tue, 13 Nov 2007 06:57:08 -0800 (PST) Received: by 10.142.148.10 with HTTP; Tue, 13 Nov 2007 06:57:08 -0800 (PST) Message-ID: Date: Tue, 13 Nov 2007 22:57:08 +0800 From: fatcat To: xfs@oss.sgi.com Subject: Latest version of XFS MIME-Version: 1.0 X-Virus-Scanned: ClamAV 0.91.2/4764/Tue Nov 13 04:43:47 2007 on oss.sgi.com X-Virus-Status: Clean Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: 7bit Content-length: 128 X-archive-position: 13649 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: wayne1017@gmail.com Precedence: bulk X-list: xfs Hello, I'm new to XFS, and I want to know the latest version of XFS. xfs1.2 ? or xfs1.4 ? [[HTML alternate version deleted]] From owner-xfs@oss.sgi.com Tue Nov 13 08:17:16 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 13 Nov 2007 08:17:21 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from sandeen.net (sandeen.net [209.173.210.139]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lADGHFm4027366 for ; Tue, 13 Nov 2007 08:17:16 -0800 Received: from [192.168.0.5] (63-225-85-186.ptld.qwest.net [63.225.85.186]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by sandeen.net (Postfix) with ESMTP id 1876A1802868A; Tue, 13 Nov 2007 10:17:19 -0600 (CST) Message-ID: <4739CE0C.8050404@sandeen.net> Date: Tue, 13 Nov 2007 08:17:16 -0800 From: Eric Sandeen User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: fatcat CC: xfs@oss.sgi.com Subject: Re: Latest version of XFS References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4764/Tue Nov 13 04:43:47 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13650 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: sandeen@sandeen.net Precedence: bulk X-list: xfs fatcat wrote: > Hello, > I'm new to XFS, and I want to know the latest version of XFS. > xfs1.2 ? or xfs1.4 ? XFS is no longer released with such versions, as it was before it was included in the upstream linux kernel. Now the latest version is "2.6.23" or "2.6.24-rc2" or "xfs cvs" depending on your point of view... -Eric From owner-xfs@oss.sgi.com Tue Nov 13 10:21:02 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 13 Nov 2007 10:21:07 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=0.4 required=5.0 tests=AWL,BAYES_05 autolearn=ham version=3.3.0-r574664 Received: from mail.pawisda.de (mail.pawisda.de [213.157.4.156]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lADIKxKK015204 for ; Tue, 13 Nov 2007 10:21:02 -0800 Received: from localhost (localhost.intra.frontsite.de [127.0.0.1]) by mail.pawisda.de (Postfix) with ESMTP id D779DF52B for ; Tue, 13 Nov 2007 19:21:05 +0100 (CET) Received: from mail.pawisda.de ([127.0.0.1]) by localhost (ndb [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 32306-09 for ; Tue, 13 Nov 2007 19:21:00 +0100 (CET) Received: from [192.168.51.2] (lw-pc002.intra.frontsite.de [192.168.51.2]) by mail.pawisda.de (Postfix) with ESMTP id 878AFF50C for ; Tue, 13 Nov 2007 19:21:00 +0100 (CET) Message-ID: <4739EB0B.6030407@linworks.de> Date: Tue, 13 Nov 2007 19:20:59 +0100 From: Ruben Porras User-Agent: Mozilla-Thunderbird 2.0.0.6 (X11/20071009) MIME-Version: 1.0 To: xfs@oss.sgi.com Subject: porting xfs_reno to linux Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV 0.91.2/4766/Tue Nov 13 08:11:35 2007 on oss.sgi.com X-Virus-Scanned: by amavisd-new at pawisda.de X-Virus-Status: Clean X-archive-position: 13651 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: ruben.porras@linworks.de Precedence: bulk X-list: xfs Let's go back to the shrink xfs theme. > 2. Move inodes out of offline AGs > - On Irix, we have a program called 'xfs_reno' which > converts 64 bit inode filesystems to 32 bit inode > filesystems. This needs to be: > - released under the GPL (should not be a problem). done > - ported to linux Do you mean, rewrite the program to work on kernel space, or just port it to glibc? Doing most of it on user land is easier, xfs_reno is already there, but it requires the AGs of the file system which are going to disappear to be traversed several times. It is needed to traverse each 'marked' AG one time to find its list of inodes. Then these inodes are going to be looked up during xfs_reno operation again several times (e.g. while copying attributes, linking and unlinking files...) is these information cached? On the other side on kernel space it can be done without going through the ioctls. > - modified to understand inodes sit in certain > AGs and to move them out of those AGs as needed. > - requires filesystem traversal to find all the > inodes to be moved. To accomplish this I would write a function that traverses all the 'marked' AGs and return the list of ids to reallocate. These list could be exported through an ioctl to user space if needed. I don't find any function that traverses an AG, I've only seen functions to look up the records of an inode. Is there any function that do something similar or do I need to write if from scratch? Where can I find examples of it? is xfs_inobt_lookup the best one? Thanks. -- Rubén Porras LinWorks GmbH From owner-xfs@oss.sgi.com Tue Nov 13 10:53:21 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 13 Nov 2007 10:53:24 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-0.2 required=5.0 tests=AWL,BAYES_40,J_CHICKENPOX_45, SPF_HELO_FAIL autolearn=no version=3.3.0-r574664 Received: from mxmail.synplicity.com (synvpn.synplicity.com [209.157.48.1]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lADIrGGc019768 for ; Tue, 13 Nov 2007 10:53:20 -0800 X-IronPort-AV: E=Sophos;i="4.21,411,1188802800"; d="scan'208";a="608396" Received: from mailhost.synplicity.com (HELO synplcty.synplicity.com) ([209.24.66.180]) by mxmail.synplicity.com with ESMTP; 13 Nov 2007 10:53:23 -0800 Received: from [63.110.200.37] (localhost [127.0.0.1]) by synplcty.synplicity.com (8.13.1/8.12.11) with ESMTP id lADIrJZM007085 for ; Tue, 13 Nov 2007 10:53:21 -0800 (PST) Message-ID: <4739F2CD.2020800@synplicity.com> Date: Tue, 13 Nov 2007 10:54:05 -0800 From: Chris Eddington User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) MIME-Version: 1.0 To: "xfs@oss.sgi.com" Subject: xfs_repair - what's the damage? Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4768/Tue Nov 13 09:25:08 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13652 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: chrise@synplicity.com Precedence: bulk X-list: xfs Hi, Can someone point me to instructions on how to understand the scope of damage to this filesystem based on the output from xfs_repair below? What is it repairing, and what data is lost? I'm not sure how to interpret these messages or where to go to find out. I'm using xfs_repair v2.8.18 on Ubuntu Linux. Thks, Chris ----------------- xfs_repair -n /dev/md0 - creating 4 worker thread(s) Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using internal log - scan filesystem freespace and inode maps... bad on-disk superblock 2 - inconsistent filesystem geometry in realtime filesystem component primary/secondary superblock 2 conflict - AG superblock geometry info conflicts with filesystem geometry would reset bad sb for ag 2 bad uncorrected agheader 2, skipping ag... bad on-disk superblock 24 - bad magic number primary/secondary superblock 24 conflict - AG superblock geometry info conflicts with filesystem geometry bad flags field in superblock 24 bad shared version number in superblock 24 bad inode alignment field in superblock 24 bad stripe unit/width fields in superblock 24 bad log/data device sector size fields in superblock 24 bad magic # 0xc486a1e7 for agi 24 bad version # 127171049 for agi 24 bad sequence # 606867126 for agi 24 bad length # -48052605 for agi 24, should be 11446496 would reset bad sb for ag 24 would reset bad agi for ag 24 bad uncorrected agheader 24, skipping ag... - 10:49:34: scanning filesystem freespace - 30 of 32 allocation groups done - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... error following ag 24 unlinked list - 10:49:34: scanning agi unlinked lists - 32 of 32 allocation groups done - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 imap claims a free inode 268435719 is in use, would correct imap and clear inode bad nblocks 23 for inode 268435723, would reset to 13 corrupt block 0 in directory inode 259 would junk block no . entry for directory 259 no .. entry for directory 259 - agno = 5 - agno = 6 - agno = 7 - agno = 8 attribute entry 0 in attr block 0, inode 2147610149 has bad name (namelen = 0) problem with attribute contents in inode 2147610149 would clear attr fork bad nblocks 11 for inode 2147610149, would reset to 10 bad anextents 1 for inode 2147610149, would reset to 0 attribute entry 0 in attr block 0, inode 2147610376 has bad name (namelen = 0) problem with attribute contents in inode 2147610376 would clear attr fork bad nblocks 13 for inode 2147610376, would reset to 12 bad anextents 1 for inode 2147610376, would reset to 0 - agno = 9 - agno = 10 - agno = 11 imap claims in-use inode 2173744652 is free, would correct imap data fork in ino 2423071372 claims free block 201330859 data fork in ino 2423071372 claims free block 201330860 data fork in ino 2423071372 claims free block 201330861 data fork in ino 2423071372 claims free block 201330862 data fork in ino 2423071372 claims free block 201330863 data fork in ino 2423071375 claims free block 268470033 data fork in ino 2423071375 claims free block 268470034 data fork in ino 2423071375 claims free block 268470035 data fork in ino 2423071375 claims free block 268470036 data fork in ino 2423071375 claims free block 268470037 data fork in ino 2423071375 claims free block 268470038 data fork in ino 2423071376 claims free block 301992536 data fork in ino 2423071376 claims free block 301992537 data fork in ino 2423071376 claims free block 301992538 data fork in ino 2423071376 claims free block 301992539 data fork in ino 2423071376 claims free block 301992540 data fork in ino 2423071376 claims free block 301992541 imap claims a free inode 2423071393 is in use, would correct imap and clear inode imap claims a free inode 2423071394 is in use, would correct imap and clear inode imap claims a free inode 2423071395 is in use, would correct imap and clear inode imap claims in-use inode 2423071409 is free, would correct imap imap claims a free inode 2423071413 is in use, would correct imap and clear inode imap claims a free inode 2691467538 is in use, would correct imap and clear inode imap claims in-use inode 2691467544 is free, would correct imap data fork in ino 2691568164 claims free block 352325825 data fork in ino 2691568164 claims free block 352325826 data fork in ino 2691568164 claims free block 352325827 data fork in ino 2691568164 claims free block 352325828 data fork in ino 2691568164 claims free block 352325829 data fork in ino 2691568164 claims free block 352325830 data fork in ino 2691568164 claims free block 352325831 data fork in ino 2691568166 claims free block 385877409 data fork in ino 2691568166 claims free block 385877410 data fork in ino 2691568166 claims free block 385877411 data fork in ino 2691568166 claims free block 385877412 data fork in ino 2691568166 claims free block 385877413 data fork in ino 2691568166 claims free block 385877414 data fork in ino 2691568166 claims free block 385877415 data fork in ino 2691568170 claims free block 184554409 data fork in ino 2691568170 claims free block 184554410 data fork in ino 2691568170 claims free block 184554411 data fork in ino 2691568170 claims free block 184554412 data fork in ino 2691568170 claims free block 184554413 data fork in ino 2691568170 claims free block 184554414 imap claims in-use inode 2691568170 is free, would correct imap data fork in ino 2691568173 claims free block 251661537 data fork in ino 2691568173 claims free block 251661538 data fork in ino 2691568173 claims free block 251661539 data fork in ino 2691568173 claims free block 251661540 data fork in ino 2691568173 claims free block 251661541 data fork in ino 2691568173 claims free block 251661542 data fork in ino 2691568174 claims free block 285214025 data fork in ino 2691568174 claims free block 285214026 data fork in ino 2691568174 claims free block 285214027 data fork in ino 2691568174 claims free block 285214028 data fork in ino 2691568174 claims free block 285214029 data fork in ino 2691568174 claims free block 285214030 data fork in ino 2691568177 claims free block 318768281 data fork in ino 2691568177 claims free block 318768282 data fork in ino 2691568177 claims free block 318768283 data fork in ino 2691568177 claims free block 318768284 data fork in ino 2691568177 claims free block 318768285 data fork in ino 2691568177 claims free block 318768286 data fork in ino 2691568177 claims free block 318768287 imap claims in-use inode 2691568177 is free, would correct imap imap claims in-use inode 2691568178 is free, would correct imap imap claims in-use inode 2691568180 is free, would correct imap imap claims in-use inode 2691568181 is free, would correct imap imap claims in-use inode 2691568183 is free, would correct imap imap claims in-use inode 2691568184 is free, would correct imap imap claims in-use inode 2691568185 is free, would correct imap - agno = 12 - agno = 13 - agno = 14 - agno = 15 - agno = 16 - agno = 17 - agno = 18 - agno = 19 - agno = 20 - agno = 21 - agno = 22 - agno = 23 - agno = 24 - agno = 25 - agno = 26 - agno = 27 - agno = 28 - agno = 29 - agno = 30 - agno = 31 - 10:53:49: process known inodes and inode discovery - 191040 of 202304 inodes done - process newly discovered inodes... - 10:53:50: process newly discovered inodes - 64 of 32 allocation groups done Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - 10:53:50: setting up duplicate extent list - 32 of 32 allocation groups done - check for inodes claiming duplicate blocks... - agno = 0 corrupt block 0 in directory inode 259 would junk block - agno = 1 bad nblocks 23 for inode 268435723, would reset to 13 entry "verif" at block 0 offset 480 in directory inode 268435726 references non-existent inode 536871184 would clear inode number in entry at offset 480... entry "rev_1" in shortform directory 268435746 references non-existent inode 536871208 would have junked entry "rev_1" in directory inode 268435746 entry "vhdl" in shortform directory 268435752 references non-existent inode 536871227 would have junked entry "vhdl" in directory inode 268435752 entry "blackbox_impl_1" in shortform directory 268435755 references non-existent inode 536871231 would have junked entry "blackbox_impl_1" in directory inode 268435755 entry "vhdl" in shortform directory 268435762 references non-existent inode 536871459 would have junked entry "vhdl" in directory inode 268435762 entry "aqm_sr_2add_fold" in shortform directory 268435765 references non-existent inode 536871469 would have junked entry "aqm_sr_2add_fold" in directory inode 268435765 entry "amplify" in shortform directory 268435768 references non-existent inode 536871472 would have junked entry "amplify" in directory inode 268435768 entry "vhdl" in shortform directory 268435769 references non-existent inode 536871475 would have junked entry "vhdl" in directory inode 268435769 entry "SynDSPparallel49_sync_ret" at block 0 offset 200 in directory inode 268435772 references non-existent inode 536871479 would clear inode number in entry at offset 200... entry "resynthesis" at block 0 offset 944 in directory inode 268435972 references non-existent inode 536871682 would clear inode number in entry at offset 944... entry "verif" at block 0 offset 368 in directory inode 268437003 references non-existent inode 536871690 would clear inode number in entry at offset 368... - agno = 3 entry ".." at block 0 offset 32 in directory inode 805306662 references non-existent inode 536871459 would clear inode number in entry at offset 32... no . entry for directory 259 no .. entry for directory 259 entry "syntmp" at block 0 offset 1408 in directory inode 291 references non-existent inode 536871482 would clear inode number in entry at offset 1408... entry "rev_1" at block 0 offset 208 in directory inode 268437017 references non-existent inode 536871692 would clear inode number in entry at offset 208... entry "test_fsm_arbiter" in shortform directory 268437038 references non-existent inode 536871721 would have junked entry "test_fsm_arbiter" in directory inode 268437038 entry "src" in shortform directory 268437039 references non-existent inode 536871722 would have junked entry "src" in directory inode 268437039 entry "slprj" at block 0 offset 1840 in directory inode 268437040 references non-existent inode 536871738 would clear inode number in entry at offset 1840... entry "src" in shortform directory 268437290 references non-existent inode 536871740 would have junked entry "src" in directory inode 268437290 entry ".." at block 0 offset 32 in directory inode 805307151 references non-existent inode 536871475 would clear inode number in entry at offset 32... entry ".." at block 0 offset 32 in directory inode 805307159 references non-existent inode 536871479 would clear inode number in entry at offset 32... - agno = 2 - agno = 4 entry ".." at block 0 offset 32 in directory inode 1073742080 references non-existent inode 536871168 would clear inode number in entry at offset 32... entry "object_2" in shortform directory 1073742083 references non-existent inode 536901176 would have junked entry "object_2" in directory inode 1073742083 entry ".." at block 0 offset 32 in directory inode 1073742098 references non-existent inode 536871208 would clear inode number in entry at offset 32... entry ".." at block 0 offset 32 in directory inode 805308172 references non-existent inode 536872499 would clear inode number in entry at offset 32... entry ".." at block 0 offset 32 in directory inode 805308201 references non-existent inode 536872508 would clear inode number in entry at offset 32... entry ".." at block 0 offset 32 in directory inode 805309952 references non-existent inode 536874804 would clear inode number in entry at offset 32... entry ".." at block 0 offset 32 in directory inode 805310005 references non-existent inode 536875066 would clear inode number in entry at offset 32... entry ".." at block 0 offset 32 in directory inode 805310210 references non-existent inode 536875070 would clear inode number in entry at offset 32... entry "verif" at block 0 offset 352 in directory inode 805310210 references non-existent inode 536875267 would clear inode number in entry at offset 352... entry ".." at block 0 offset 32 in directory inode 1073744910 references non-existent inode 536873218 would clear inode number in entry at offset 32... entry ".." at block 0 offset 32 in directory inode 805310515 references non-existent inode 536875326 would clear inode number in entry at offset 32... entry ".." at block 0 offset 32 in directory inode 805310729 references non-existent inode 536875541 would clear inode number in entry at offset 32... entry ".." at block 0 offset 32 in directory inode 805310766 references non-existent inode 536875806 would clear inode number in entry at offset 32... entry ".." at block 0 offset 32 in directory inode 805311009 references non-existent inode 536876306 would clear inode number in entry at offset 32... entry ".." at block 0 offset 32 in directory inode 805311249 references non-existent inode 536876575 would clear inode number in entry at offset 32... entry "pop" at block 0 offset 512 in directory inode 805311249 references non-existent inode 536909576 would clear inode number in entry at offset 512... entry "src" in shortform directory 268437291 references non-existent inode 536871948 would have junked entry "src" in directory inode 268437291 entry "src" in shortform directory 268437292 references non-existent inode 536871964 would have junked entry "src" in directory inode 268437292 entry "src" in shortform directory 268437293 references non-existent inode 536871980 would have junked entry "src" in directory inode 268437293 entry "rev_1" at block 0 offset 696 in directory inode 268437301 references non-existent inode 536872195 would clear inode number in entry at offset 696... entry "Retiming" in shortform directory 268437516 references non-existent inode 536872207 would have junked entry "Retiming" in directory inode 268437516 entry "FIXED POINT ARITHMETIC" in shortform directory 268437523 references non-existent inode 536872214 would have junked entry "FIXED POINT ARITHMETIC" in directory inode 268437523 a long list of this stuff .... disconnected dir inode 4130509338, would move to lost+found disconnected inode 4136546817, would move to lost+found disconnected inode 4136546818, would move to lost+found disconnected inode 4136546820, would move to lost+found disconnected inode 4136546821, would move to lost+found disconnected inode 4136546823, would move to lost+found disconnected inode 4136546824, would move to lost+found disconnected inode 4136546826, would move to lost+found disconnected inode 4136546827, would move to lost+found disconnected dir inode 4179320356, would move to lost+found disconnected dir inode 4179320371, would move to lost+found disconnected dir inode 4180337155, would move to lost+found disconnected dir inode 4180337177, would move to lost+found disconnected dir inode 4180337199, would move to lost+found Phase 7 - verify link counts... would have reset inode 268435723 nlinks from 65535 to 1 another long list of these ... would have reset inode 268435726 nlinks from 4 to 3 would have reset inode 268435746 nlinks from 3 to 2 would have reset inode 268435752 nlinks from 4 to 2 would have reset inode 268435755 nlinks from 3 to 2 would have reset inode 268435762 nlinks from 3 to 2 would have reset inode 268435765 nlinks from 4 to 2 would have reset inode 268435768 nlinks from 3 to 2 would have reset inode 268435769 nlinks from 3 to 2 would have reset inode 268435772 nlinks from 6 to 5 would have reset inode 268435972 nlinks from 4 to 3 would have reset inode 268437003 nlinks from 5 to 4 would have reset inode 268437017 nlinks from 3 to 2 would have reset inode 4136546825 nlinks from 5 to 4 would have reset inode 4168420144 nlinks from 7 to 4 - 10:54:24: verify link counts - 191040 of 202304 inodes done No modify flag set, skipping filesystem flush and exiting. From owner-xfs@oss.sgi.com Tue Nov 13 13:00:27 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 13 Nov 2007 13:00:32 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lADL0LfD008145 for ; Tue, 13 Nov 2007 13:00:26 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id IAA28903; Wed, 14 Nov 2007 08:00:23 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lADL0LdD102651885; Wed, 14 Nov 2007 08:00:22 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lADL0IPs105101679; Wed, 14 Nov 2007 08:00:18 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Wed, 14 Nov 2007 08:00:18 +1100 From: David Chinner To: Ruben Porras Cc: xfs@oss.sgi.com Subject: Re: porting xfs_reno to linux Message-ID: <20071113210018.GX995458@sgi.com> References: <4739EB0B.6030407@linworks.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4739EB0B.6030407@linworks.de> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4768/Tue Nov 13 09:25:08 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13653 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Tue, Nov 13, 2007 at 07:20:59PM +0100, Ruben Porras wrote: > Let's go back to the shrink xfs theme. > > >2. Move inodes out of offline AGs > > - On Irix, we have a program called 'xfs_reno' which > > converts 64 bit inode filesystems to 32 bit inode > > filesystems. This needs to be: > > - released under the GPL (should not be a problem). > > done > > > - ported to linux > > Do you mean, rewrite the program to work on kernel space, No. > or just port it > to glibc? Port to linux. i.e. make it work and remove any tainted code that it might contain from Irix so we can open source it. Barry has already done this; the patch is here: http://oss.sgi.com/archives/xfs/2007-10/msg00054.html All it needs is reviewing and then xfs_reno for linux is done. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Tue Nov 13 13:08:58 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 13 Nov 2007 13:09:02 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_45 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lADL8rDD009492 for ; Tue, 13 Nov 2007 13:08:57 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id IAA29089; Wed, 14 Nov 2007 08:08:55 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lADL8sdD105014461; Wed, 14 Nov 2007 08:08:55 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lADL8r1C105256912; Wed, 14 Nov 2007 08:08:53 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Wed, 14 Nov 2007 08:08:53 +1100 From: David Chinner To: Chris Eddington Cc: "xfs@oss.sgi.com" Subject: Re: xfs_repair - what's the damage? Message-ID: <20071113210853.GY995458@sgi.com> References: <4739F2CD.2020800@synplicity.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4739F2CD.2020800@synplicity.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4768/Tue Nov 13 09:25:08 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13654 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Tue, Nov 13, 2007 at 10:54:05AM -0800, Chris Eddington wrote: > Hi, > > Can someone point me to instructions on how to understand the scope of > damage to this filesystem based on the output from xfs_repair below? > What is it repairing, and what data is lost? I'm not sure how to interpret > these messages or where to go to find out. Looks like you had something write crap over various parts of the filesystem. Both AG 2 and ag 24 have header problems, and then there's a bunch of freespace and allocated inode problems because the indexes were lost due ot the header corruption. Who knows how much else is broken - it depends on how much bad data got written into the filesystem. best you can do is to run xfs_repair and sift through the debris in lost+found and try to work out what the lost data is... As I always ask - how did the filesytem get into this state? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Tue Nov 13 21:08:15 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 13 Nov 2007 21:08:49 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.4 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAE585WW006020 for ; Tue, 13 Nov 2007 21:08:10 -0800 Received: from timothy-shimmins-power-mac-g5.local (boing.melbourne.sgi.com [134.14.55.141]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA13645; Wed, 14 Nov 2007 16:07:58 +1100 Message-ID: <473A82E6.50709@sgi.com> Date: Wed, 14 Nov 2007 16:08:54 +1100 From: Timothy Shimmin User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Andreas Gruenbacher CC: linux-xfs@oss.sgi.com, Gerald Bringhurst , Brandon Philips Subject: Re: acl and attr: Fix path walking code References: <200710281858.24428.agruen@suse.de> <4733F301.9020706@sgi.com> <200711102152.05619.agruen@suse.de> In-Reply-To: <200711102152.05619.agruen@suse.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4773/Tue Nov 13 15:26:59 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13655 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: tes@sgi.com Precedence: bulk X-list: xfs Andreas Gruenbacher wrote: > On Friday 09 November 2007 06:41:21 Timothy Shimmin wrote: >> I applied attr patch and tried it out on xfstests/062 >> (which I believe was based on one of your tests). >> >> ========================================================== >> --- 062.out 2006-03-28 12:52:32.000000000 +1000 >> +++ 062.out.bad 2007-11-09 15:38:09.000000000 +1100 >> @@ -526,6 +526,10 @@ >> user.name=0xbabe >> user.name3=0xdeface >> >> +# file: SCRATCH_MNT/lnk >> +trusted.name=0xbabe >> +trusted.name3=0xdeface >> + >> # file: SCRATCH_MNT/dev/b >> trusted.name=0xbabe >> trusted.name3=0xdeface >> @@ -562,6 +566,10 @@ >> user.1=0x3233 >> user.x=0x797a >> >> +# file: SCRATCH_MNT/descend/and/ascend >> +trusted.9=0x3837 >> +trusted.a=0x6263 >> + >> >> *** directory descent without following symlinks >> # file: SCRATCH_MNT/reg >> ========================================================== >> >> So for the following of symlinks with getfattr -L >> i.e. >> echo "*** directory descent with us following symlinks" >> getfattr -h -L -R -m '.' -e hex $SCRATCH_MNT >> >> Looking at the 2nd difference... >> It now picks up descend/and/ascend which contains the symlink >> of descend/and --> here/up. >> So that makes sense, it is following a symlink which it >> didn't before and finding a dir, "up" in the linked dir. >> Good. >> >> Looking at 1st difference... >> It is now showing up "lnk" which is a symlink: lnk --> dir >> So why is it showing this up >> and yet it is not showing descend/and (which is a link to here/up)? >> So yes we are following symlinks but are we supposed >> to just do the symlinks themselves as well? > > With -h, the utilities operate on the symlinks rather than the files that the > symlinks point to. The test case sets attributes on SCRATCH_MNT/lnk, but not > on descend/and. Oops, yep, there is no EA on descend/and. > > The -h and -L options together don't make much sense actually. > No they don't :) So will it not follow the argument but follow any descendents that it finds on the walk. It kind of looked from the manpage that the -h is about just the argument and not about the walk. Anyway, I took out the -h with the -L, i.e. $ getfattr -L -R -m '.' -e hex $SCRATCH_MNT And it is still reporting >> +# file: SCRATCH_MNT/lnk >> +trusted.name=0xbabe >> +trusted.name3=0xdeface >> + So I presume following symlinks also mean operating on symlinks too (i.e. getting the EA)? --- > On Friday 09 November 2007 08:39:56 Timothy Shimmin wrote: >> > You mention -L/-P is like chown. >> > However, -P for getattr isn't about not walking symlinks >> > to directories, >> > it's about skipping symlinks altogether, right? > > Hmm, -L and -P define which files and directories are visited, and -h defines > whether we are looking at symlinks or the files they point to. The two > concepts are orthogonal. -P is not about skipping symlinks, only about not > recursing into them. > Oh okay. There is the concept of following the symlink for traversal versus following the symlink to get the EA on. So with -L should it just follow the symlink or look at the symlink first and then follow it? And will -h modify this behavior? I'm still confused about the 1st difference in 062 output. I wonder if the man pages can be clarified in this area :) --Tim From owner-xfs@oss.sgi.com Tue Nov 13 23:30:38 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 13 Nov 2007 23:31:24 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from smtp123.sbc.mail.sp1.yahoo.com (smtp123.sbc.mail.sp1.yahoo.com [69.147.64.96]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAE7UYBu022638 for ; Tue, 13 Nov 2007 23:30:37 -0800 Received: (qmail 83703 invoked from network); 14 Nov 2007 07:04:01 -0000 Received: from unknown (HELO stupidest.org) (cwedgwood@sbcglobal.net@75.36.198.62 with login) by smtp123.sbc.mail.sp1.yahoo.com with SMTP; 14 Nov 2007 07:04:00 -0000 X-YMail-OSG: hjLDFe4VM1mWiJBz1UydZ63KEi1WEBH90fRVSqothaC7jHRXTsCdB3RCWNsvflFK0UEuLMUKgw-- Received: by tuatara.stupidest.org (Postfix, from userid 10000) id 392532808117; Tue, 13 Nov 2007 23:04:00 -0800 (PST) Date: Tue, 13 Nov 2007 23:04:00 -0800 From: Chris Wedgwood To: linux-xfs@oss.sgi.com Cc: LKML Subject: 2.6.24-rc2 XFS nfsd hang Message-ID: <20071114070400.GA25708@puku.stupidest.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Virus-Scanned: ClamAV 0.91.2/4773/Tue Nov 13 15:26:59 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13656 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: cw@f00f.org Precedence: bulk X-list: xfs With 2.6.24-rc2 (amd64) I sometimes (usually but perhaps not always) see a hang when accessing some NFS exported XFS filesystems. Local access to these filesystems ahead of time works without problems. This does not occur with 2.6.23.1. The filesystem does not appear to be corrupt. The call chain for the wedged process is: [ 1462.911256] nfsd D ffffffff80547840 4760 2966 2 [ 1462.911283] ffff81010414d4d0 0000000000000046 0000000000000000 ffff81010414d610 [ 1462.911322] ffff810104cbc6e0 ffff81010414d480 ffffffff80746dc0 ffffffff80746dc0 [ 1462.911360] ffffffff80744020 ffffffff80746dc0 ffff81010129c140 ffff8101000ad100 [ 1462.911391] Call Trace: [ 1462.911417] [] __down+0xe9/0x101 [ 1462.911437] [] default_wake_function+0x0/0xe [ 1462.911458] [] __down_failed+0x35/0x3a [ 1462.911480] [] _xfs_buf_find+0x84/0x24d [ 1462.911501] [] _xfs_buf_find+0x193/0x24d [ 1462.911522] [] xfs_buf_lock+0x43/0x45 [ 1462.911543] [] _xfs_buf_find+0x1ba/0x24d [ 1462.911564] [] xfs_buf_get_flags+0x5a/0x14b [ 1462.911586] [] xfs_buf_read_flags+0x12/0x86 [ 1462.911607] [] xfs_trans_read_buf+0x4c/0x2cf [ 1462.911629] [] xfs_da_do_buf+0x41b/0x65b [ 1462.911652] [] xfs_da_read_buf+0x24/0x29 [ 1462.911673] [] xfs_dir2_block_lookup_int+0x4d/0x1ab [ 1462.911694] [] xfs_dir2_block_lookup_int+0x4d/0x1ab [ 1462.911717] [] xfs_dir2_block_lookup+0x15/0x8e [ 1462.911738] [] xfs_dir_lookup+0xd2/0x12c [ 1462.911761] [] submit_bio+0x10d/0x114 [ 1462.911781] [] xfs_dir_lookup_int+0x2c/0xc5 [ 1462.911802] [] lockdep_init_map+0x90/0x495 [ 1462.911823] [] xfs_lookup+0x44/0x6f [ 1462.911843] [] xfs_vn_lookup+0x29/0x60 [ 1462.915246] [] __lookup_hash+0xe5/0x109 [ 1462.915267] [] lookup_one_len+0x41/0x4e [ 1462.915289] [] compose_entry_fh+0xc1/0x117 [ 1462.915311] [] encode_entry+0x17c/0x38b [ 1462.915333] [] find_or_create_page+0x3f/0xc9 [ 1462.915355] [] _xfs_buf_lookup_pages+0x2c1/0x2f6 [ 1462.915377] [] _spin_unlock+0x1f/0x49 [ 1462.915399] [] cache_alloc_refill+0x1ba/0x4b9 [ 1462.915424] [] nfs3svc_encode_entry_plus+0x0/0x13 [ 1462.915448] [] nfs3svc_encode_entry_plus+0x10/0x13 [ 1462.915469] [] xfs_dir2_block_getdents+0x15b/0x1e2 [ 1462.915491] [] nfs3svc_encode_entry_plus+0x0/0x13 [ 1462.915514] [] nfs3svc_encode_entry_plus+0x0/0x13 [ 1462.915534] [] xfs_readdir+0x91/0xb6 [ 1462.915557] [] nfs3svc_encode_entry_plus+0x0/0x13 [ 1462.915579] [] xfs_file_readdir+0x31/0x40 [ 1462.915599] [] vfs_readdir+0x61/0x93 [ 1462.915619] [] nfs3svc_encode_entry_plus+0x0/0x13 [ 1462.915642] [] nfsd_readdir+0x6d/0xc5 [ 1462.915663] [] nfsd3_proc_readdirplus+0x114/0x204 [ 1462.915686] [] nfsd_dispatch+0xde/0x1b6 [ 1462.915706] [] svc_process+0x3f8/0x717 [ 1462.915729] [] nfsd+0x1a9/0x2c1 [ 1462.915749] [] child_rip+0xa/0x12 [ 1462.915769] [] __svc_create_thread+0xea/0x1eb [ 1462.915792] [] nfsd+0x0/0x2c1 [ 1462.915812] [] child_rip+0x0/0x12 Over time other processes pile up beind this. [ 1462.910728] nfsd D ffffffffffffffff 5440 2965 2 [ 1462.910769] ffff8101040cdd40 0000000000000046 0000000000000001 ffff810103471900 [ 1462.910812] ffff8101029a72c0 ffff8101040cdcf0 ffffffff80746dc0 ffffffff80746dc0 [ 1462.910852] ffffffff80744020 ffffffff80746dc0 ffff81010008e0c0 ffff8101012a1040 [ 1462.910882] Call Trace: [ 1462.910909] [] nfsd_permission+0x95/0xeb [ 1462.910931] [] vfs_readdir+0x46/0x93 [ 1462.910950] [] mutex_lock_nested+0x165/0x27c [ 1462.910971] [] _spin_unlock+0x1f/0x49 [ 1462.910994] [] nfs3svc_encode_entry_plus+0x0/0x13 [ 1462.911015] [] vfs_readdir+0x46/0x93 [ 1462.911037] [] nfs3svc_encode_entry_plus+0x0/0x13 [ 1462.911057] [] nfsd_readdir+0x6d/0xc5 [ 1462.911079] [] nfsd3_proc_readdirplus+0x114/0x204 [ 1462.911102] [] nfsd_dispatch+0xde/0x1b6 [ 1462.911122] [] svc_process+0x3f8/0x717 [ 1462.911143] [] nfsd+0x1a9/0x2c1 [ 1462.911165] [] child_rip+0xa/0x12 [ 1462.911184] [] __svc_create_thread+0xea/0x1eb [ 1462.911206] [] nfsd+0x0/0x2c1 [ 1462.911225] [] child_rip+0x0/0x12 Any suggestions other than to bisect this? (Bisection might be painful as it crosses the x86-merge.) From owner-xfs@oss.sgi.com Wed Nov 14 04:49:05 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 14 Nov 2007 04:49:16 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-0.9 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from smtp123.sbc.mail.sp1.yahoo.com (smtp123.sbc.mail.sp1.yahoo.com [69.147.64.96]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAECn3Vg008079 for ; Wed, 14 Nov 2007 04:49:04 -0800 Received: (qmail 59539 invoked from network); 14 Nov 2007 11:49:09 -0000 Received: from unknown (HELO stupidest.org) (cwedgwood@sbcglobal.net@24.5.75.45 with login) by smtp123.sbc.mail.sp1.yahoo.com with SMTP; 14 Nov 2007 11:49:08 -0000 X-YMail-OSG: 3BM2TUgVM1kDu47611Df_i9KFswRZEP.gTBtpmJcXfTvKu4sTQCHvfCKZUeZ842fBORefL3B1g-- Received: by tuatara.stupidest.org (Postfix, from userid 10000) id 1F801284CBA8; Wed, 14 Nov 2007 03:49:07 -0800 (PST) Date: Wed, 14 Nov 2007 03:49:07 -0800 From: Chris Wedgwood To: linux-xfs@oss.sgi.com, Christoph Hellwig , David Chinner Cc: LKML , Benny Halevy , Christian Kujau Subject: 2.6.24-rc2 XFS nfsd hang --- filldir change responsible? Message-ID: <20071114114907.GA31466@puku.stupidest.org> References: <20071114070400.GA25708@puku.stupidest.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071114070400.GA25708@puku.stupidest.org> X-Virus-Scanned: ClamAV 0.91.2/4774/Wed Nov 14 02:27:16 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13657 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: cw@f00f.org Precedence: bulk X-list: xfs On Tue, Nov 13, 2007 at 11:04:00PM -0800, Chris Wedgwood wrote: > With 2.6.24-rc2 (amd64) I sometimes (usually but perhaps not always) > see a hang when accessing some NFS exported XFS filesystems. Local > access to these filesystems ahead of time works without problems. > > This does not occur with 2.6.23.1. The filesystem does not appear > to be corrupt. After some bisection pain (sg broken in the middle and XFS not compiling in other places) the regression seems to be: commit 051e7cd44ab8f0f7c2958371485b4a1ff64a8d1b Author: Christoph Hellwig Date: Tue Aug 28 13:58:24 2007 +1000 [XFS] use filldir internally There have been a lot of changes since this so reverting it and retesting as-is won't work. I'll have to see what I can come up with after some sleep. I'm not building/testing with dmapi --- perhaps that makes a difference here? I would think it would have broken with xfsqa but the number of bug reports seems small so far. From owner-xfs@oss.sgi.com Wed Nov 14 05:20:22 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 14 Nov 2007 05:20:41 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from fieldses.org (mail.fieldses.org [66.93.2.214]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAEDKK1N013703 for ; Wed, 14 Nov 2007 05:20:21 -0800 Received: from bfields by fieldses.org with local (Exim 4.68) (envelope-from ) id 1IsHpj-0001Jy-SD; Wed, 14 Nov 2007 07:59:07 -0500 Date: Wed, 14 Nov 2007 07:59:07 -0500 To: Benny Halevy Cc: Chris Wedgwood , linux-xfs@oss.sgi.com, LKML , Christian Kujau Subject: Re: 2.6.24-rc2 XFS nfsd hang Message-ID: <20071114125907.GB4010@fieldses.org> References: <20071114070400.GA25708@puku.stupidest.org> <473AA72C.6020308@panasas.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <473AA72C.6020308@panasas.com> User-Agent: Mutt/1.5.17 (2007-11-01) From: "J. Bruce Fields" X-Virus-Scanned: ClamAV 0.91.2/4776/Wed Nov 14 03:49:31 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13658 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bfields@fieldses.org Precedence: bulk X-list: xfs On Wed, Nov 14, 2007 at 09:43:40AM +0200, Benny Halevy wrote: > I wonder if this is a similar hang to what Christian was seeing here: > http://lkml.org/lkml/2007/11/13/319 Ah, thanks for noticing that. Christian Kujau, is /data an xfs partition? There are a bunch of xfs commits in ^92d15c2ccbb3e31a3fc71ad28fdb55e1319383c0 ^291702f017efdfe556cb87b8530eb7d1ff08cbae ^1d677a6dfaac1d1cf51a7f58847077240985faf2 ^fba956c46a72f9e7503fd464ffee43c632307e31 ^bbf25010f1a6b761914430f5fca081ec8c7accd1 6e800af233e0bdf108efb7bd23c11ea6fa34cdeb 7b1915a989ea4d426d0fd98974ab80f30ef1d779 c223701cf6c706f42840631c1ca919a18e6e2800 f77bf01425b11947eeb3b5b54685212c302741b8 which was the range remaining for him to bisect. --b. > > Benny > > On Nov. 14, 2007, 9:04 +0200, Chris Wedgwood wrote: > > With 2.6.24-rc2 (amd64) I sometimes (usually but perhaps not always) > > see a hang when accessing some NFS exported XFS filesystems. Local > > access to these filesystems ahead of time works without problems. > > > > This does not occur with 2.6.23.1. The filesystem does not appear to > > be corrupt. > > > > > > The call chain for the wedged process is: > > > > [ 1462.911256] nfsd D ffffffff80547840 4760 2966 2 > > [ 1462.911283] ffff81010414d4d0 0000000000000046 0000000000000000 ffff81010414d610 > > [ 1462.911322] ffff810104cbc6e0 ffff81010414d480 ffffffff80746dc0 ffffffff80746dc0 > > [ 1462.911360] ffffffff80744020 ffffffff80746dc0 ffff81010129c140 ffff8101000ad100 > > [ 1462.911391] Call Trace: > > [ 1462.911417] [] __down+0xe9/0x101 > > [ 1462.911437] [] default_wake_function+0x0/0xe > > [ 1462.911458] [] __down_failed+0x35/0x3a > > [ 1462.911480] [] _xfs_buf_find+0x84/0x24d > > [ 1462.911501] [] _xfs_buf_find+0x193/0x24d > > [ 1462.911522] [] xfs_buf_lock+0x43/0x45 > > [ 1462.911543] [] _xfs_buf_find+0x1ba/0x24d > > [ 1462.911564] [] xfs_buf_get_flags+0x5a/0x14b > > [ 1462.911586] [] xfs_buf_read_flags+0x12/0x86 > > [ 1462.911607] [] xfs_trans_read_buf+0x4c/0x2cf > > [ 1462.911629] [] xfs_da_do_buf+0x41b/0x65b > > [ 1462.911652] [] xfs_da_read_buf+0x24/0x29 > > [ 1462.911673] [] xfs_dir2_block_lookup_int+0x4d/0x1ab > > [ 1462.911694] [] xfs_dir2_block_lookup_int+0x4d/0x1ab > > [ 1462.911717] [] xfs_dir2_block_lookup+0x15/0x8e > > [ 1462.911738] [] xfs_dir_lookup+0xd2/0x12c > > [ 1462.911761] [] submit_bio+0x10d/0x114 > > [ 1462.911781] [] xfs_dir_lookup_int+0x2c/0xc5 > > [ 1462.911802] [] lockdep_init_map+0x90/0x495 > > [ 1462.911823] [] xfs_lookup+0x44/0x6f > > [ 1462.911843] [] xfs_vn_lookup+0x29/0x60 > > [ 1462.915246] [] __lookup_hash+0xe5/0x109 > > [ 1462.915267] [] lookup_one_len+0x41/0x4e > > [ 1462.915289] [] compose_entry_fh+0xc1/0x117 > > [ 1462.915311] [] encode_entry+0x17c/0x38b > > [ 1462.915333] [] find_or_create_page+0x3f/0xc9 > > [ 1462.915355] [] _xfs_buf_lookup_pages+0x2c1/0x2f6 > > [ 1462.915377] [] _spin_unlock+0x1f/0x49 > > [ 1462.915399] [] cache_alloc_refill+0x1ba/0x4b9 > > [ 1462.915424] [] nfs3svc_encode_entry_plus+0x0/0x13 > > [ 1462.915448] [] nfs3svc_encode_entry_plus+0x10/0x13 > > [ 1462.915469] [] xfs_dir2_block_getdents+0x15b/0x1e2 > > [ 1462.915491] [] nfs3svc_encode_entry_plus+0x0/0x13 > > [ 1462.915514] [] nfs3svc_encode_entry_plus+0x0/0x13 > > [ 1462.915534] [] xfs_readdir+0x91/0xb6 > > [ 1462.915557] [] nfs3svc_encode_entry_plus+0x0/0x13 > > [ 1462.915579] [] xfs_file_readdir+0x31/0x40 > > [ 1462.915599] [] vfs_readdir+0x61/0x93 > > [ 1462.915619] [] nfs3svc_encode_entry_plus+0x0/0x13 > > [ 1462.915642] [] nfsd_readdir+0x6d/0xc5 > > [ 1462.915663] [] nfsd3_proc_readdirplus+0x114/0x204 > > [ 1462.915686] [] nfsd_dispatch+0xde/0x1b6 > > [ 1462.915706] [] svc_process+0x3f8/0x717 > > [ 1462.915729] [] nfsd+0x1a9/0x2c1 > > [ 1462.915749] [] child_rip+0xa/0x12 > > [ 1462.915769] [] __svc_create_thread+0xea/0x1eb > > [ 1462.915792] [] nfsd+0x0/0x2c1 > > [ 1462.915812] [] child_rip+0x0/0x12 > > > > Over time other processes pile up beind this. > > > > [ 1462.910728] nfsd D ffffffffffffffff 5440 2965 2 > > [ 1462.910769] ffff8101040cdd40 0000000000000046 0000000000000001 ffff810103471900 > > [ 1462.910812] ffff8101029a72c0 ffff8101040cdcf0 ffffffff80746dc0 ffffffff80746dc0 > > [ 1462.910852] ffffffff80744020 ffffffff80746dc0 ffff81010008e0c0 ffff8101012a1040 > > [ 1462.910882] Call Trace: > > [ 1462.910909] [] nfsd_permission+0x95/0xeb > > [ 1462.910931] [] vfs_readdir+0x46/0x93 > > [ 1462.910950] [] mutex_lock_nested+0x165/0x27c > > [ 1462.910971] [] _spin_unlock+0x1f/0x49 > > [ 1462.910994] [] nfs3svc_encode_entry_plus+0x0/0x13 > > [ 1462.911015] [] vfs_readdir+0x46/0x93 > > [ 1462.911037] [] nfs3svc_encode_entry_plus+0x0/0x13 > > [ 1462.911057] [] nfsd_readdir+0x6d/0xc5 > > [ 1462.911079] [] nfsd3_proc_readdirplus+0x114/0x204 > > [ 1462.911102] [] nfsd_dispatch+0xde/0x1b6 > > [ 1462.911122] [] svc_process+0x3f8/0x717 > > [ 1462.911143] [] nfsd+0x1a9/0x2c1 > > [ 1462.911165] [] child_rip+0xa/0x12 > > [ 1462.911184] [] __svc_create_thread+0xea/0x1eb > > [ 1462.911206] [] nfsd+0x0/0x2c1 > > [ 1462.911225] [] child_rip+0x0/0x12 > > > > > > Any suggestions other than to bisect this? (Bisection might be > > painful as it crosses the x86-merge.) > > - > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > > From owner-xfs@oss.sgi.com Wed Nov 14 06:19:30 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 14 Nov 2007 06:19:41 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-0.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from mail.pawisda.de (mail.pawisda.de [213.157.4.156]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAEEJOSn023040 for ; Wed, 14 Nov 2007 06:19:30 -0800 Received: from localhost (localhost.intra.frontsite.de [127.0.0.1]) by mail.pawisda.de (Postfix) with ESMTP id 83948F52B; Wed, 14 Nov 2007 15:19:31 +0100 (CET) Received: from mail.pawisda.de ([127.0.0.1]) by localhost (ndb [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 10871-02; Wed, 14 Nov 2007 15:19:22 +0100 (CET) Received: from [192.168.51.2] (lw-pc002.intra.frontsite.de [192.168.51.2]) by mail.pawisda.de (Postfix) with ESMTP id 49CA1C9A8; Wed, 14 Nov 2007 15:19:22 +0100 (CET) Message-ID: <473B03EA.2040002@linworks.de> Date: Wed, 14 Nov 2007 15:19:22 +0100 From: Ruben Porras User-Agent: Mozilla-Thunderbird 2.0.0.6 (X11/20071009) MIME-Version: 1.0 To: David Chinner Cc: xfs@oss.sgi.com Subject: Re: porting xfs_reno to linux References: <4739EB0B.6030407@linworks.de> <20071113210018.GX995458@sgi.com> In-Reply-To: <20071113210018.GX995458@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV 0.91.2/4776/Wed Nov 14 03:49:31 2007 on oss.sgi.com X-Virus-Scanned: by amavisd-new at pawisda.de X-Virus-Status: Clean X-archive-position: 13659 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: ruben.porras@linworks.de Precedence: bulk X-list: xfs David Chinner schrieb: > On Tue, Nov 13, 2007 at 07:20:59PM +0100, Ruben Porras wrote: > >> Let's go back to the shrink xfs theme. >> >> >>> 2. Move inodes out of offline AGs >>> - On Irix, we have a program called 'xfs_reno' which >>> converts 64 bit inode filesystems to 32 bit inode >>> filesystems. This needs to be: >>> - released under the GPL (should not be a problem). >>> >> done >> >> >>> - ported to linux >>> >> Do you mean, rewrite the program to work on kernel space, >> > > No. > > >> or just port it >> to glibc? >> > > Port to linux. i.e. make it work and remove any tainted code that it > might contain from Irix so we can open source it. Barry has > already done this; the patch is here: > > http://oss.sgi.com/archives/xfs/2007-10/msg00054.html > > All it needs is reviewing and then xfs_reno for linux is done. > > Sorry, I didn't exlplain myself well enough. I already saw the mail from Barry, but the program needs not only a review. Now xfs_reno filter the inodes with nftw and the stat info. The problem is that we need to filter the inodes according to the AG where they are. Currently there is no way to find this out. Possibilities: a) Extend the bulkstat structure to include the AG number (better not) b) A new ioctl to find out the AG of an inode. and call the ioctl for each file on the fs. c) Find all the inodes in 'marked' AGs. Export it through a new ioctl. xfs_reno needs to find later which files on the fs were on the list. The second way would be to do everything in kernel space. The steps would be: a) Find all the inodes in 'marked' AGs b) Allocate a new inode for each one c) Move the information from each old inode to the new one and unlink the old one. (This is what Barry suggest to do anyway in kernel space) xfs_reno does only steps b) and c). The second needs more work, but it doesn't need to traverse the filesystem several times. > Cheers, > > Dave. > -- Rubén Porras LinWorks GmbH From owner-xfs@oss.sgi.com Wed Nov 14 06:48:40 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 14 Nov 2007 06:49:06 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00,RDNS_NONE autolearn=no version=3.3.0-r574664 Received: from sa10.bezeqint.net ([192.115.104.24]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAEEmZcP027448 for ; Wed, 14 Nov 2007 06:48:39 -0800 Received: from localhost (unknown [127.0.0.1]) by sa10.bezeqint.net (Bezeq International SMTP out Mail Server) with ESMTP id 9B542106A30; Wed, 14 Nov 2007 09:42:26 +0200 (IST) Received: from sa10.bezeqint.net ([127.0.0.1]) by localhost (sa10.bezeqint.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 24275-10; Wed, 14 Nov 2007 09:42:22 +0200 (IST) Received: from fs1.bhalevy.com (unknown [62.219.195.70]) by sa10.bezeqint.net (Bezeq International SMTP out Mail Server) with ESMTP; Wed, 14 Nov 2007 09:42:22 +0200 (IST) Message-ID: <473AA72C.6020308@panasas.com> Date: Wed, 14 Nov 2007 09:43:40 +0200 From: Benny Halevy User-Agent: Thunderbird 2.0.0.6 (X11/20070728) MIME-Version: 1.0 To: Chris Wedgwood CC: linux-xfs@oss.sgi.com, LKML , Christian Kujau , "J. Bruce Fields" Subject: Re: 2.6.24-rc2 XFS nfsd hang References: <20071114070400.GA25708@puku.stupidest.org> In-Reply-To: <20071114070400.GA25708@puku.stupidest.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4776/Wed Nov 14 03:49:31 2007 on oss.sgi.com X-Virus-Scanned: amavisd-new at bezeqint.net X-Virus-Status: Clean X-archive-position: 13660 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bhalevy@panasas.com Precedence: bulk X-list: xfs I wonder if this is a similar hang to what Christian was seeing here: http://lkml.org/lkml/2007/11/13/319 Benny On Nov. 14, 2007, 9:04 +0200, Chris Wedgwood wrote: > With 2.6.24-rc2 (amd64) I sometimes (usually but perhaps not always) > see a hang when accessing some NFS exported XFS filesystems. Local > access to these filesystems ahead of time works without problems. > > This does not occur with 2.6.23.1. The filesystem does not appear to > be corrupt. > > > The call chain for the wedged process is: > > [ 1462.911256] nfsd D ffffffff80547840 4760 2966 2 > [ 1462.911283] ffff81010414d4d0 0000000000000046 0000000000000000 ffff81010414d610 > [ 1462.911322] ffff810104cbc6e0 ffff81010414d480 ffffffff80746dc0 ffffffff80746dc0 > [ 1462.911360] ffffffff80744020 ffffffff80746dc0 ffff81010129c140 ffff8101000ad100 > [ 1462.911391] Call Trace: > [ 1462.911417] [] __down+0xe9/0x101 > [ 1462.911437] [] default_wake_function+0x0/0xe > [ 1462.911458] [] __down_failed+0x35/0x3a > [ 1462.911480] [] _xfs_buf_find+0x84/0x24d > [ 1462.911501] [] _xfs_buf_find+0x193/0x24d > [ 1462.911522] [] xfs_buf_lock+0x43/0x45 > [ 1462.911543] [] _xfs_buf_find+0x1ba/0x24d > [ 1462.911564] [] xfs_buf_get_flags+0x5a/0x14b > [ 1462.911586] [] xfs_buf_read_flags+0x12/0x86 > [ 1462.911607] [] xfs_trans_read_buf+0x4c/0x2cf > [ 1462.911629] [] xfs_da_do_buf+0x41b/0x65b > [ 1462.911652] [] xfs_da_read_buf+0x24/0x29 > [ 1462.911673] [] xfs_dir2_block_lookup_int+0x4d/0x1ab > [ 1462.911694] [] xfs_dir2_block_lookup_int+0x4d/0x1ab > [ 1462.911717] [] xfs_dir2_block_lookup+0x15/0x8e > [ 1462.911738] [] xfs_dir_lookup+0xd2/0x12c > [ 1462.911761] [] submit_bio+0x10d/0x114 > [ 1462.911781] [] xfs_dir_lookup_int+0x2c/0xc5 > [ 1462.911802] [] lockdep_init_map+0x90/0x495 > [ 1462.911823] [] xfs_lookup+0x44/0x6f > [ 1462.911843] [] xfs_vn_lookup+0x29/0x60 > [ 1462.915246] [] __lookup_hash+0xe5/0x109 > [ 1462.915267] [] lookup_one_len+0x41/0x4e > [ 1462.915289] [] compose_entry_fh+0xc1/0x117 > [ 1462.915311] [] encode_entry+0x17c/0x38b > [ 1462.915333] [] find_or_create_page+0x3f/0xc9 > [ 1462.915355] [] _xfs_buf_lookup_pages+0x2c1/0x2f6 > [ 1462.915377] [] _spin_unlock+0x1f/0x49 > [ 1462.915399] [] cache_alloc_refill+0x1ba/0x4b9 > [ 1462.915424] [] nfs3svc_encode_entry_plus+0x0/0x13 > [ 1462.915448] [] nfs3svc_encode_entry_plus+0x10/0x13 > [ 1462.915469] [] xfs_dir2_block_getdents+0x15b/0x1e2 > [ 1462.915491] [] nfs3svc_encode_entry_plus+0x0/0x13 > [ 1462.915514] [] nfs3svc_encode_entry_plus+0x0/0x13 > [ 1462.915534] [] xfs_readdir+0x91/0xb6 > [ 1462.915557] [] nfs3svc_encode_entry_plus+0x0/0x13 > [ 1462.915579] [] xfs_file_readdir+0x31/0x40 > [ 1462.915599] [] vfs_readdir+0x61/0x93 > [ 1462.915619] [] nfs3svc_encode_entry_plus+0x0/0x13 > [ 1462.915642] [] nfsd_readdir+0x6d/0xc5 > [ 1462.915663] [] nfsd3_proc_readdirplus+0x114/0x204 > [ 1462.915686] [] nfsd_dispatch+0xde/0x1b6 > [ 1462.915706] [] svc_process+0x3f8/0x717 > [ 1462.915729] [] nfsd+0x1a9/0x2c1 > [ 1462.915749] [] child_rip+0xa/0x12 > [ 1462.915769] [] __svc_create_thread+0xea/0x1eb > [ 1462.915792] [] nfsd+0x0/0x2c1 > [ 1462.915812] [] child_rip+0x0/0x12 > > Over time other processes pile up beind this. > > [ 1462.910728] nfsd D ffffffffffffffff 5440 2965 2 > [ 1462.910769] ffff8101040cdd40 0000000000000046 0000000000000001 ffff810103471900 > [ 1462.910812] ffff8101029a72c0 ffff8101040cdcf0 ffffffff80746dc0 ffffffff80746dc0 > [ 1462.910852] ffffffff80744020 ffffffff80746dc0 ffff81010008e0c0 ffff8101012a1040 > [ 1462.910882] Call Trace: > [ 1462.910909] [] nfsd_permission+0x95/0xeb > [ 1462.910931] [] vfs_readdir+0x46/0x93 > [ 1462.910950] [] mutex_lock_nested+0x165/0x27c > [ 1462.910971] [] _spin_unlock+0x1f/0x49 > [ 1462.910994] [] nfs3svc_encode_entry_plus+0x0/0x13 > [ 1462.911015] [] vfs_readdir+0x46/0x93 > [ 1462.911037] [] nfs3svc_encode_entry_plus+0x0/0x13 > [ 1462.911057] [] nfsd_readdir+0x6d/0xc5 > [ 1462.911079] [] nfsd3_proc_readdirplus+0x114/0x204 > [ 1462.911102] [] nfsd_dispatch+0xde/0x1b6 > [ 1462.911122] [] svc_process+0x3f8/0x717 > [ 1462.911143] [] nfsd+0x1a9/0x2c1 > [ 1462.911165] [] child_rip+0xa/0x12 > [ 1462.911184] [] __svc_create_thread+0xea/0x1eb > [ 1462.911206] [] nfsd+0x0/0x2c1 > [ 1462.911225] [] child_rip+0x0/0x12 > > > Any suggestions other than to bisect this? (Bisection might be > painful as it crosses the x86-merge.) > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > From owner-xfs@oss.sgi.com Wed Nov 14 06:59:55 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 14 Nov 2007 07:00:03 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,MIME_8BIT_HEADER autolearn=no version=3.3.0-r574664 Received: from an-out-0708.google.com (an-out-0708.google.com [209.85.132.244]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAEExpa7029544 for ; Wed, 14 Nov 2007 06:59:54 -0800 Received: by an-out-0708.google.com with SMTP id d11so74181and for ; Wed, 14 Nov 2007 06:59:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; bh=GRYT/RxeuudOpOw39J+W0U/c4hWjYFbPipPSThl/9xc=; b=Zl4toXungIzgjgjZ/RP0BTLUgCqkm4pv4IIf9HrZNgNe7+9UtapEDDJ4DfTIs6cO84Fe2jUwlHZv5lTQOqF0/EcNOKqgZOw5plBzgIg08DMPA9Ap0cNy6bOpio50mwwY+Gp5DEG6ndxQDMn+dngMZm1lI4eTBx7nqJobs2kC8jU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=Bem+dLLP6CpHsdi+FAxI44lWRMwU8m7jTZrXCAFInSXxLtiV63oIIfkzgmukEGp6HVn3ka3+lxpkpCQng5LX5pM7vRD0lUvosrxZPBLNO+dGcwxVppNsFdvZNK6kim5vocrTjeYd88K0QtPAQH7vvTUSHa+xQL4c0m/VfSmoiRg= Received: by 10.142.203.13 with SMTP id a13mr1169843wfg.1195050953749; Wed, 14 Nov 2007 06:35:53 -0800 (PST) Received: by 10.142.253.5 with HTTP; Wed, 14 Nov 2007 06:35:53 -0800 (PST) Message-ID: Date: Wed, 14 Nov 2007 15:35:53 +0100 From: "=?ISO-8859-2?Q?Rafa=B3_Rzepecki?=" To: xfs@oss.sgi.com Subject: Possible bug in XFS causing scheduling while atomic BUG in Linux 2.6.23 MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-2 Content-Disposition: inline X-Virus-Scanned: ClamAV 0.91.2/4778/Wed Nov 14 05:34:46 2007 on oss.sgi.com X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by oss.sgi.com id lAEExta7029549 X-archive-position: 13661 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: divided.mind@gmail.com Precedence: bulk X-list: xfs Hi! Please forgive me if it's not the right place to report that (and do point me to the right place in that case); I also apologize if this report is missing any important information: this is my first time reporting a kernel bug. Anyway, I've run into a kernel bug, and from the trace I assume it's xfs's fault. Relevant messages: Nov 14 14:47:18 [kernel] BUG: scheduling while atomic: mlnet/0x00000002/11031 Nov 14 14:47:18 [kernel] [] __sched_text_start+0x2be/0x397 Nov 14 14:47:18 [kernel] [] __down+0x7e/0x112 Nov 14 14:47:18 [kernel] [] default_wake_function+0x0/0xc Nov 14 14:47:18 [kernel] [] __down_failed+0x7/0xc Nov 14 14:47:18 [kernel] [] xfs_buf_lock+0x2d/0x35 Nov 14 14:47:18 [kernel] [] xfs_getsb+0x25/0x32 Nov 14 14:47:18 [kernel] [] xfs_trans_getsb+0x2d/0x8f Nov 14 14:47:18 [kernel] [] xfs_trans_apply_sb_deltas+0x15/0x5cd Nov 14 14:47:18 [kernel] [] xfs_alloc_mark_busy+0xa3/0xb1 Nov 14 14:47:18 [kernel] [] _xfs_trans_commit+0x90/0x32c Nov 14 14:47:18 [kernel] [] xfs_free_extent+0xcd/0xe3 Nov 14 14:47:18 [kernel] [] kmem_zone_alloc+0x45/0xbc Nov 14 14:47:18 [kernel] [] xfs_bmap_finish+0x11e/0x169 Nov 14 14:47:18 [kernel] [] kmem_zone_zalloc+0x26/0x4f Nov 14 14:47:18 [kernel] [] xfs_itruncate_finish+0x168/0x3f8 Nov 14 14:47:18 [kernel] [] xfs_trans_ijoin+0x2b/0x86 Nov 14 14:47:18 [kernel] [] xfs_free_eofblocks+0x2a4/0x2db Nov 14 14:47:18 [kernel] [] xfs_release+0x1b1/0x21b Nov 14 14:47:18 [kernel] [] d_kill+0x40/0x52 Nov 14 14:47:18 [kernel] [] xfs_file_release+0x13/0x1a Nov 14 14:47:18 [kernel] [] __fput+0x14d/0x17d Nov 14 14:47:18 [kernel] [] filp_close+0x3c/0x7b Nov 14 14:47:18 [kernel] [] close_files+0x71/0x93 Nov 14 14:47:18 [kernel] [] put_files_struct+0x27/0x6b Nov 14 14:47:18 [kernel] [] do_exit+0x14d/0x43f Nov 14 14:47:18 [kernel] [] do_trap+0x0/0x10c Nov 14 14:47:18 [kernel] [] do_page_fault+0x2c5/0x62c Nov 14 14:47:18 [kernel] [] do_page_fault+0x0/0x62c Nov 14 14:47:18 [kernel] [] error_code+0x6a/0x70 Nov 14 14:47:18 [kernel] [] __wake_up_common+0x13/0x56 Nov 14 14:47:18 [kernel] [] need_resched+0x1f/0x21 Nov 14 14:47:18 [kernel] [] __wake_up+0x31/0x63 Nov 14 14:47:18 [kernel] [] xfs_buf_unpin+0x29/0x2d Nov 14 14:47:18 [kernel] [] xfs_buf_item_unpin+0x34/0xc2 Nov 14 14:47:18 [kernel] [] xfs_trans_chunk_committed+0x11a/0x156 Nov 14 14:47:18 [kernel] [] xfs_trans_committed+0xdb/0xea Nov 14 14:47:18 [kernel] [] xlog_state_do_callback+0x148/0x2e3 Nov 14 14:47:18 [kernel] [] xfs_buf_iodone_work+0x24/0x2f Nov 14 14:47:18 [kernel] [] xfs_buf_iorequest+0x63/0x65 Nov 14 14:47:18 [kernel] [] xlog_bdstrat_cb+0x16/0x3d Nov 14 14:47:18 [kernel] [] xlog_sync+0x206/0x483 Nov 14 14:47:18 [kernel] [] xfs_trans_tail_ail+0x16/0x44 Nov 14 14:47:18 [kernel] [] xlog_state_sync_all+0x1b0/0x269 Nov 14 14:47:18 [kernel] [] xfs_trans_read_buf+0xcb/0x34e Nov 14 14:47:18 [kernel] [] _xfs_log_force+0x58/0x5e Nov 14 14:47:18 [kernel] [] xfs_iget_core+0x4ca/0x74f Nov 14 14:47:18 [kernel] [] xfs_fs_alloc_inode+0xf/0x1c Nov 14 14:47:18 [kernel] [] xfs_iget+0xc0/0x11e Nov 14 14:47:18 [kernel] [] xfs_trans_iget+0xc2/0x16b Nov 14 14:47:18 [kernel] [] xfs_ialloc+0xcf/0x5d7 Nov 14 14:47:18 [kernel] [] xfs_dir_ialloc+0x85/0x2aa Nov 14 14:47:18 [kernel] [] xfs_trans_reserve+0x80/0x1e6 Nov 14 14:47:18 [kernel] [] xfs_create+0x248/0x684 Nov 14 14:47:18 [kernel] [] xfs_acl_vhasacl_default+0x3a/0x4b Nov 14 14:47:18 [kernel] [] xfs_vn_mknod+0x246/0x370 Nov 14 14:47:18 [kernel] [] xfs_vn_permission+0xf/0x13 Nov 14 14:47:18 [kernel] [] xfs_vn_permission+0x0/0x13 Nov 14 14:47:18 [kernel] [] permission+0xd6/0xe9 Nov 14 14:47:18 [kernel] [] vfs_create+0x8a/0xd8 Nov 14 14:47:18 [kernel] [] open_namei_create+0x53/0xae Nov 14 14:47:18 [kernel] [] open_namei+0x576/0x589 Nov 14 14:47:18 [kernel] [] do_filp_open+0x2e/0x5b Nov 14 14:47:18 [kernel] [] get_unused_fd_flags+0x31/0xde Nov 14 14:47:18 [kernel] [] do_sys_open+0x50/0xdd Nov 14 14:47:18 [kernel] [] sys_open+0x1c/0x20 Nov 14 14:47:18 [kernel] [] sysenter_past_esp+0x5f/0x85 Nov 14 14:47:18 [kernel] ======================= I haven't been doing anything in particular when that showed up, so I can't say what could have caused it. Anything more I can tell is that the system has been spinning with that bug for at least an hour: I have relatively short logs, so the earliest saved message is from 13:50 and it's exactly the same trace. Output from ver_linux: Linux divide 2.6.23-gentoo #5 PREEMPT Thu Nov 1 03:29:13 CET 2007 i686 Intel(R) Celeron(R) CPU 2.26GHz GenuineIntel GNU/Linux Gnu C 3.4.3-20050110 Gnu make 3.81 binutils 2.17 util-linux 2.13 mount 2.13 module-init-tools 3.1 e2fsprogs 1.36 reiserfsprogs 3.6.12 xfsprogs 2.6.25 PPP 2.4.1 Linux C Library 2.6 Dynamic linker (ldd) 2.6 Procps 3.1.11 Net-tools 1.60 Kbd 1.08 oprofile 0.9.3 Sh-utils 6.9 udev 103 wireless-tools 29 Modules Loaded lp irtty_sir sir_dev ns558 gameport rtc_cmos nvidia r8180 ieee80211_rtl ieee80211_crypt_rtl If I can provide any more information, please don't hesitate to ask. Note that I'm using the gentoo flavour of kernel, so if I should report this first to Gentoo, do tell me that. ps. I'm not subscribed to the list, so I'd appreciate if any follow-ups were CCed to me. -- Kind regards - Rafa³ Rzepecki From owner-xfs@oss.sgi.com Wed Nov 14 07:05:12 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 14 Nov 2007 07:05:21 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-0.9 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_45, J_CHICKENPOX_52,J_CHICKENPOX_62 autolearn=no version=3.3.0-r574664 Received: from SVITS26.main.ad.rit.edu (svits26.main.ad.rit.edu [129.21.18.136]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAEF5AVN030737 for ; Wed, 14 Nov 2007 07:05:11 -0800 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Subject: RE: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c Date: Wed, 14 Nov 2007 10:05:52 -0500 Message-ID: <06CCEA2EB1B80A4A937ED59005FA855101AED622@svits26.main.ad.rit.edu> In-Reply-To: <06CCEA2EB1B80A4A937ED59005FA855101AED213@svits26.main.ad.rit.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c Thread-Index: Acgc+HCNQCFWOWFQTvaSJTeWO28rWgAWfz9QAAMsAYACW7/4IA== From: "Jay Sullivan" To: Cc: "Jay Sullivan" X-Virus-Scanned: ClamAV 0.91.2/4778/Wed Nov 14 05:34:46 2007 on oss.sgi.com X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id lAEF5CVN030750 X-archive-position: 13662 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jpspgd@rit.edu Precedence: bulk X-list: xfs Of course this had to happen one more time before my scheduled maintenance window... Anyways, here's all of the good stuff I collected. Can anyone make sense of it? Oh, and I upgraded to xfsprogs 2.9.4 last week, so all output you see is with that version. Thanks! ################### dmesg output ################### XFS internal error XFS_WANT_CORRUPTED_GOTO at line 4533 of file fs/xfs/xfs_bmap.c. Caller 0xc028c5a2 [] xfs_bmap_read_extents+0x3bd/0x498 [] xfs_iread_extents+0x74/0xe1 [] xfs_iext_realloc_direct+0xa4/0xe7 [] xfs_iext_add+0x138/0x272 [] xfs_iread_extents+0x74/0xe1 [] xfs_bmapi+0x1ca/0x173f [] elv_rb_add+0x6f/0x88 [] as_update_rq+0x32/0x72 [] as_add_request+0x76/0xa4 [] elv_insert+0xd5/0x142 [] __make_request+0xc8/0x305 [] generic_make_request+0x122/0x1d9 [] __map_bio+0x33/0xa9 [] __clone_and_map+0xda/0x34c [] mempool_alloc+0x2a/0xdb [] xfs_ilock+0x58/0xa0 [] xfs_iomap+0x216/0x4b7 [] __xfs_get_blocks+0x6b/0x226 [] radix_tree_node_alloc+0x16/0x57 [] radix_tree_insert+0xb0/0x126 [] xfs_get_blocks+0x28/0x2d [] block_read_full_page+0x192/0x346 [] xfs_get_blocks+0x0/0x2d [] xfs_iget+0x145/0x150 [] do_mpage_readpage+0x530/0x621 [] xfs_iunlock+0x43/0x84 [] xfs_vget+0xe1/0xf2 [] find_exported_dentry+0x71/0x4b6 [] __do_page_cache_readahead+0x88/0x153 [] mpage_readpage+0x4b/0x5e [] xfs_get_blocks+0x0/0x2d [] blockable_page_cache_readahead+0x4d/0xb9 [] page_cache_readahead+0x174/0x1a3 [] find_get_page+0x18/0x3a [] do_generic_mapping_read+0x1b5/0x535 [] __capable+0x8/0x1b [] generic_file_sendfile+0x68/0x83 [] nfsd_read_actor+0x0/0x10f [] xfs_sendfile+0x94/0x164 [] nfsd_read_actor+0x0/0x10f [] nfsd_permission+0x6e/0x103 [] xfs_file_sendfile+0x4c/0x5c [] nfsd_read_actor+0x0/0x10f [] nfsd_vfs_read+0x344/0x361 [] nfsd_read_actor+0x0/0x10f [] nfsd_read+0xd8/0xf9 [] nfsd3_proc_read+0xb0/0x174 [] nfs3svc_decode_readargs+0x0/0xf7 [] nfsd_dispatch+0x8a/0x1f5 [] svcauth_unix_set_client+0x11d/0x175 [] svc_process+0x4fd/0x681 [] nfsd+0x163/0x273 [] nfsd+0x0/0x273 [] kernel_thread_helper+0x7/0x10 ======================= attempt to access beyond end of device dm-1: rw=0, want=6763361770196172808, limit=7759462400 I/O error in filesystem ("dm-1") meta-data dev dm-1 block 0x5ddc49b238000000 ("xfs_trans_read_buf") error 5 buf count 4096 xfs_force_shutdown(dm-1,0x1) called from line 415 of file fs/xfs/xfs_trans_buf.c. Return address = 0xc02baa25 Filesystem "dm-1": I/O Error Detected. Shutting down filesystem: dm-1 Please umount the filesystem, and rectify the problem(s) ####################### At this point I umount'ed and mount'ed the FS several times, but xfs_repair still told me to use -L... Any ideas? ####################### server-files ~ # umount /mnt/san/ server-files ~ # mount /mnt/san/ server-files ~ # umount /mnt/san/ server-files ~ # xfs_repair /dev/server-files-sanvg01/server-files-sanlv01 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. server-files ~ # xfs_repair -L /dev/server-files-sanvg01/server-files-sanlv01 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 4002: Badness in key lookup (length) bp=(bno 2561904, len 16384 bytes) key=(bno 2561904, len 8192 bytes) 8003: Badness in key lookup (length) bp=(bno 0, len 512 bytes) key=(bno 0, len 4096 bytes) bad bmap btree ptr 0x5f808b0400000000 in ino 5123809 bad data fork in inode 5123809 cleared inode 5123809 bad magic # 0x58465342 in inode 7480148 (data fork) bmbt block 0 bad data fork in inode 7480148 cleared inode 7480148 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 entry "Fuller_RotoscopeCorrected.mov" at block 0 offset 184 in directory inode 8992373 references free inode 7480148 clearing inode number in entry at offset 184... Phase 5 - rebuild AG headers and trees... - reset superblock... 4000: Badness in key lookup (length) bp=(bno 0, len 4096 bytes) key=(bno 0, len 512 bytes) Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... bad hash table for directory inode 8992373 (no data entry): rebuilding rebuilding directory inode 8992373 4000: Badness in key lookup (length) bp=(bno 0, len 4096 bytes) key=(bno 0, len 512 bytes) 4000: Badness in key lookup (length) bp=(bno 0, len 4096 bytes) key=(bno 0, len 512 bytes) - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... 4000: Badness in key lookup (length) bp=(bno 0, len 4096 bytes) key=(bno 0, len 512 bytes) done server-files ~ # mount /mnt/san server-files ~ # umount /mnt/san server-files ~ # xfs_repair -L /dev/server-files-sanvg01/server-files-sanlv01 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... server-files ~ # xfs_repair /dev/server-files-sanvg01/server-files-sanlv01 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... XFS: totally zeroed log - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... done ################ So that's it for now. Next week I'll be rsyncing all of the data off of this volume to another array. I still want to know what's happening, though... *pout* Anyways, thanks a lot for everyone's help. ~Jay -----Original Message----- From: xfs-bounce@oss.sgi.com [mailto:xfs-bounce@oss.sgi.com] On Behalf Of Jay Sullivan Sent: Friday, November 02, 2007 10:49 AM To: xfs@oss.sgi.com Subject: RE: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c What can I say about Murphy and his silly laws? I just had a drive fail on my array. I wonder if this is the root of my problems... Yay parity. ~Jay -----Original Message----- From: xfs-bounce@oss.sgi.com [mailto:xfs-bounce@oss.sgi.com] On Behalf Of Jay Sullivan Sent: Friday, November 02, 2007 10:00 AM To: xfs@oss.sgi.com Subject: RE: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c I lost the xfs_repair output on an xterm with only four lines of scrollback... I'll definitely be more careful to preserve more 'evidence' next time. =( "Pics or it didn't happen", right? I just upgraded xfsprogs and will scan the disk during my next scheduled downtime (probably in about 2 weeks). I'm tempted to just wipe the volume and start over: I have enough 'spare' space lying around to copy everything out to a fresh XFS volume. Regarding "areca": I'm using hardware RAID built into Apple XServe RAIDs o'er LSI FC929X cards. Someone else offered the likely explanation that the btree is corrupted. Isn't this something xfs_repair should be able to fix? Would it be easier, safer, and faster to move the data to a new volume (and restore corrupted files if/as I find them from backup)? We're talking about just less than 4TB of data which used to take about 6 hours to fsck (one pass) with ext3. Restoring the whole shebang from backups would probably take the better part of 12 years (waiting for compression, resetting ACLs, etc.)... FWIW, another (way less important,) much busier and significantly larger logical volume on the same array has been totally fine. Murphy--go figure. Thanks! -----Original Message----- From: Eric Sandeen [mailto:sandeen@sandeen.net] Sent: Thursday, November 01, 2007 10:30 PM To: Jay Sullivan Cc: xfs@oss.sgi.com Subject: Re: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c Jay Sullivan wrote: > Good eye: it wasn't mountable, thus the -L flag. No recent > (unplanned) power outages. The machine and the array that holds the > disks are both on serious batteries/UPS and the array's cache > batteries are in good health. Did you have the xfs_repair output to see what it found? You might also grab the very latest xfsprogs (2.9.4) in case it's catching more cases. I hate it when people suggest running memtest86, but I might do that anyway. :) What controller are you using? If you say "areca" I might be on to something with some other bugs I've seen... -Eric From owner-xfs@oss.sgi.com Wed Nov 14 07:48:57 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 14 Nov 2007 07:49:06 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAEFmuhM004844 for ; Wed, 14 Nov 2007 07:48:57 -0800 Received: from hch by pentafluge.infradead.org with local (Exim 4.63 #1 (Red Hat Linux)) id 1IsKBd-0001Cz-0b; Wed, 14 Nov 2007 15:29:53 +0000 Date: Wed, 14 Nov 2007 15:29:52 +0000 From: Christoph Hellwig To: Chris Wedgwood Cc: linux-xfs@oss.sgi.com, LKML Subject: Re: 2.6.24-rc2 XFS nfsd hang Message-ID: <20071114152952.GA4210@infradead.org> References: <20071114070400.GA25708@puku.stupidest.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071114070400.GA25708@puku.stupidest.org> User-Agent: Mutt/1.4.2.3i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-Virus-Scanned: ClamAV 0.91.2/4778/Wed Nov 14 05:34:46 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13663 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs On Tue, Nov 13, 2007 at 11:04:00PM -0800, Chris Wedgwood wrote: > With 2.6.24-rc2 (amd64) I sometimes (usually but perhaps not always) > see a hang when accessing some NFS exported XFS filesystems. Local > access to these filesystems ahead of time works without problems. > > This does not occur with 2.6.23.1. The filesystem does not appear to > be corrupt. > > [ 1462.911360] ffffffff80744020 ffffffff80746dc0 ffff81010129c140 ffff8101000ad100 > [ 1462.911391] Call Trace: > [ 1462.911417] [] __down+0xe9/0x101 > [ 1462.911437] [] default_wake_function+0x0/0xe > [ 1462.911458] [] __down_failed+0x35/0x3a > [ 1462.911480] [] _xfs_buf_find+0x84/0x24d > [ 1462.911501] [] _xfs_buf_find+0x193/0x24d > [ 1462.911522] [] xfs_buf_lock+0x43/0x45 this is bp->b_sema which lookup wants. > [ 1462.915534] [] xfs_readdir+0x91/0xb6 > [ 1462.915557] [] nfs3svc_encode_entry_plus+0x0/0x13 > [ 1462.915579] [] xfs_file_readdir+0x31/0x40 > [ 1462.915599] [] vfs_readdir+0x61/0x93 > [ 1462.915619] [] nfs3svc_encode_entry_plus+0x0/0x13 > [ 1462.915642] [] nfsd_readdir+0x6d/0xc5 and this is the nasty nfsd case where a filldir callback calls back into lookup. I suspect we're somehow holding b_sema already. Previously this was okay because we weren't inside the actualy readdir code when calling filldir but operate on a copy of the data. This gem has bitten other filesystem before, I'll see if I can find a way around it. From owner-xfs@oss.sgi.com Wed Nov 14 09:39:26 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 14 Nov 2007 09:39:30 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from fieldses.org (mail.fieldses.org [66.93.2.214]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAEHdMH1025016 for ; Wed, 14 Nov 2007 09:39:26 -0800 Received: from bfields by fieldses.org with local (Exim 4.68) (envelope-from ) id 1IsMCx-0006lW-0w; Wed, 14 Nov 2007 12:39:23 -0500 Date: Wed, 14 Nov 2007 12:39:22 -0500 To: Christoph Hellwig Cc: Chris Wedgwood , linux-xfs@oss.sgi.com, LKML Subject: Re: 2.6.24-rc2 XFS nfsd hang Message-ID: <20071114173922.GC14254@fieldses.org> References: <20071114070400.GA25708@puku.stupidest.org> <20071114152952.GA4210@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071114152952.GA4210@infradead.org> User-Agent: Mutt/1.5.17 (2007-11-01) From: "J. Bruce Fields" X-Virus-Scanned: ClamAV 0.91.2/4785/Wed Nov 14 08:26:20 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13664 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bfields@fieldses.org Precedence: bulk X-list: xfs On Wed, Nov 14, 2007 at 03:29:52PM +0000, Christoph Hellwig wrote: > On Tue, Nov 13, 2007 at 11:04:00PM -0800, Chris Wedgwood wrote: > > With 2.6.24-rc2 (amd64) I sometimes (usually but perhaps not always) > > see a hang when accessing some NFS exported XFS filesystems. Local > > access to these filesystems ahead of time works without problems. > > > > This does not occur with 2.6.23.1. The filesystem does not appear to > > be corrupt. > > > > > [ 1462.911360] ffffffff80744020 ffffffff80746dc0 ffff81010129c140 ffff8101000ad100 > > [ 1462.911391] Call Trace: > > [ 1462.911417] [] __down+0xe9/0x101 > > [ 1462.911437] [] default_wake_function+0x0/0xe > > [ 1462.911458] [] __down_failed+0x35/0x3a > > [ 1462.911480] [] _xfs_buf_find+0x84/0x24d > > [ 1462.911501] [] _xfs_buf_find+0x193/0x24d > > [ 1462.911522] [] xfs_buf_lock+0x43/0x45 > > this is bp->b_sema which lookup wants. > > > [ 1462.915534] [] xfs_readdir+0x91/0xb6 > > [ 1462.915557] [] nfs3svc_encode_entry_plus+0x0/0x13 > > [ 1462.915579] [] xfs_file_readdir+0x31/0x40 > > [ 1462.915599] [] vfs_readdir+0x61/0x93 > > [ 1462.915619] [] nfs3svc_encode_entry_plus+0x0/0x13 > > [ 1462.915642] [] nfsd_readdir+0x6d/0xc5 > > and this is the nasty nfsd case where a filldir callback calls back > into lookup. I suspect we're somehow holding b_sema already. Previously > this was okay because we weren't inside the actualy readdir code when > calling filldir but operate on a copy of the data. > > This gem has bitten other filesystem before, I'll see if I can find a > way around it. This must have come up before; feel free to remind me: is there any way to make the interface easier to use? (E.g. would it help if the filldir callback could be passed a dentry?) --b. From owner-xfs@oss.sgi.com Wed Nov 14 09:44:19 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 14 Nov 2007 09:44:23 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAEHiFSx025847 for ; Wed, 14 Nov 2007 09:44:19 -0800 Received: from hch by pentafluge.infradead.org with local (Exim 4.63 #1 (Red Hat Linux)) id 1IsMHj-00042A-DF; Wed, 14 Nov 2007 17:44:19 +0000 Date: Wed, 14 Nov 2007 17:44:19 +0000 From: Christoph Hellwig To: "J. Bruce Fields" Cc: Christoph Hellwig , Chris Wedgwood , linux-xfs@oss.sgi.com, LKML Subject: Re: 2.6.24-rc2 XFS nfsd hang Message-ID: <20071114174419.GA15271@infradead.org> References: <20071114070400.GA25708@puku.stupidest.org> <20071114152952.GA4210@infradead.org> <20071114173922.GC14254@fieldses.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071114173922.GC14254@fieldses.org> User-Agent: Mutt/1.4.2.3i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-Virus-Scanned: ClamAV 0.91.2/4785/Wed Nov 14 08:26:20 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13665 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs On Wed, Nov 14, 2007 at 12:39:22PM -0500, J. Bruce Fields wrote: > This must have come up before; feel free to remind me: is there any way > to make the interface easier to use? (E.g. would it help if the filldir > callback could be passed a dentry?) The best thing for the filesystem would be to have a readdirplus (or have it folded into readdir) instead of calling into lookup from ->filldir. From owner-xfs@oss.sgi.com Wed Nov 14 09:53:18 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 14 Nov 2007 09:53:22 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from fieldses.org (mail.fieldses.org [66.93.2.214]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAEHrHmb027452 for ; Wed, 14 Nov 2007 09:53:18 -0800 Received: from bfields by fieldses.org with local (Exim 4.68) (envelope-from ) id 1IsMQU-00071u-6V; Wed, 14 Nov 2007 12:53:22 -0500 Date: Wed, 14 Nov 2007 12:53:22 -0500 To: Christoph Hellwig Cc: Chris Wedgwood , linux-xfs@oss.sgi.com, LKML Subject: Re: 2.6.24-rc2 XFS nfsd hang Message-ID: <20071114175322.GD14254@fieldses.org> References: <20071114070400.GA25708@puku.stupidest.org> <20071114152952.GA4210@infradead.org> <20071114173922.GC14254@fieldses.org> <20071114174419.GA15271@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071114174419.GA15271@infradead.org> User-Agent: Mutt/1.5.17 (2007-11-01) From: "J. Bruce Fields" X-Virus-Scanned: ClamAV 0.91.2/4785/Wed Nov 14 08:26:20 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13666 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bfields@fieldses.org Precedence: bulk X-list: xfs On Wed, Nov 14, 2007 at 05:44:19PM +0000, Christoph Hellwig wrote: > On Wed, Nov 14, 2007 at 12:39:22PM -0500, J. Bruce Fields wrote: > > This must have come up before; feel free to remind me: is there any way > > to make the interface easier to use? (E.g. would it help if the filldir > > callback could be passed a dentry?) > > The best thing for the filesystem would be to have a readdirplus > (or have it folded into readdir) instead of calling into lookup > from ->filldir. And the readdirplus would pass a dentry to its equivalent of ->filldir? Or something else? --b. From owner-xfs@oss.sgi.com Wed Nov 14 10:02:40 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 14 Nov 2007 10:02:45 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAEI2bPP029134 for ; Wed, 14 Nov 2007 10:02:40 -0800 Received: from hch by pentafluge.infradead.org with local (Exim 4.63 #1 (Red Hat Linux)) id 1IsMZV-0004TI-GE; Wed, 14 Nov 2007 18:02:41 +0000 Date: Wed, 14 Nov 2007 18:02:41 +0000 From: Christoph Hellwig To: "J. Bruce Fields" Cc: Christoph Hellwig , Chris Wedgwood , linux-xfs@oss.sgi.com, LKML Subject: Re: 2.6.24-rc2 XFS nfsd hang Message-ID: <20071114180241.GA16656@infradead.org> References: <20071114070400.GA25708@puku.stupidest.org> <20071114152952.GA4210@infradead.org> <20071114173922.GC14254@fieldses.org> <20071114174419.GA15271@infradead.org> <20071114175322.GD14254@fieldses.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071114175322.GD14254@fieldses.org> User-Agent: Mutt/1.4.2.3i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-Virus-Scanned: ClamAV 0.91.2/4785/Wed Nov 14 08:26:20 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13667 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs On Wed, Nov 14, 2007 at 12:53:22PM -0500, J. Bruce Fields wrote: > On Wed, Nov 14, 2007 at 05:44:19PM +0000, Christoph Hellwig wrote: > > On Wed, Nov 14, 2007 at 12:39:22PM -0500, J. Bruce Fields wrote: > > > This must have come up before; feel free to remind me: is there any way > > > to make the interface easier to use? (E.g. would it help if the filldir > > > callback could be passed a dentry?) > > > > The best thing for the filesystem would be to have a readdirplus > > (or have it folded into readdir) instead of calling into lookup > > from ->filldir. > > And the readdirplus would pass a dentry to its equivalent of ->filldir? > Or something else? Personally I'd prefer it to only grow a struct stat or rather it's members But the nfsd code currently expects a dentry so this might require some major refactoring. From owner-xfs@oss.sgi.com Wed Nov 14 10:08:36 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 14 Nov 2007 10:08:38 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from fieldses.org (mail.fieldses.org [66.93.2.214]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAEI8Y5R030302 for ; Wed, 14 Nov 2007 10:08:35 -0800 Received: from bfields by fieldses.org with local (Exim 4.68) (envelope-from ) id 1IsMfG-0007Jp-DT; Wed, 14 Nov 2007 13:08:38 -0500 Date: Wed, 14 Nov 2007 13:08:38 -0500 To: Christoph Hellwig Cc: Chris Wedgwood , linux-xfs@oss.sgi.com, LKML Subject: Re: 2.6.24-rc2 XFS nfsd hang Message-ID: <20071114180838.GE14254@fieldses.org> References: <20071114070400.GA25708@puku.stupidest.org> <20071114152952.GA4210@infradead.org> <20071114173922.GC14254@fieldses.org> <20071114174419.GA15271@infradead.org> <20071114175322.GD14254@fieldses.org> <20071114180241.GA16656@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071114180241.GA16656@infradead.org> User-Agent: Mutt/1.5.17 (2007-11-01) From: "J. Bruce Fields" X-Virus-Scanned: ClamAV 0.91.2/4785/Wed Nov 14 08:26:20 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13668 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bfields@fieldses.org Precedence: bulk X-list: xfs On Wed, Nov 14, 2007 at 06:02:41PM +0000, Christoph Hellwig wrote: > On Wed, Nov 14, 2007 at 12:53:22PM -0500, J. Bruce Fields wrote: > > On Wed, Nov 14, 2007 at 05:44:19PM +0000, Christoph Hellwig wrote: > > > On Wed, Nov 14, 2007 at 12:39:22PM -0500, J. Bruce Fields wrote: > > > > This must have come up before; feel free to remind me: is there any way > > > > to make the interface easier to use? (E.g. would it help if the filldir > > > > callback could be passed a dentry?) > > > > > > The best thing for the filesystem would be to have a readdirplus > > > (or have it folded into readdir) instead of calling into lookup > > > from ->filldir. > > > > And the readdirplus would pass a dentry to its equivalent of ->filldir? > > Or something else? > > Personally I'd prefer it to only grow a struct stat or rather it's members > But the nfsd code currently expects a dentry so this might require some > major refactoring. Well, we need to check for mountpoints, for example, so I don't see any way out of needing a dentry. What's the drawback? --b. From owner-xfs@oss.sgi.com Wed Nov 14 14:48:44 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 14 Nov 2007 14:48:49 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from mail.g-house.de (ns2.g-housing.de [81.169.133.75]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAEMmgvn009715 for ; Wed, 14 Nov 2007 14:48:43 -0800 Received: from [89.54.143.169] (helo=[192.168.178.25]) by mail.g-house.de with esmtpsa (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.63) (envelope-from ) id 1IsR4t-00042O-Hf; Wed, 14 Nov 2007 23:51:23 +0100 Date: Wed, 14 Nov 2007 23:48:46 +0100 (CET) From: Christian Kujau X-X-Sender: evil@sheep.housecafe.de To: Chris Wedgwood cc: linux-xfs@oss.sgi.com, Christoph Hellwig , David Chinner , LKML , Benny Halevy Subject: Re: 2.6.24-rc2 XFS nfsd hang --- filldir change responsible? In-Reply-To: <20071114114907.GA31466@puku.stupidest.org> Message-ID: References: <20071114070400.GA25708@puku.stupidest.org> <20071114114907.GA31466@puku.stupidest.org> User-Agent: Alpine 0.99999 (DEB 796 2007-11-08) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: ClamAV 0.91.2/4792/Wed Nov 14 12:35:29 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13669 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lists@nerdbynature.de Precedence: bulk X-list: xfs On Wed, 14 Nov 2007, Chris Wedgwood wrote: > After some bisection pain (sg broken in the middle and XFS not > compiling in other places) the regression seems to be: > > commit 051e7cd44ab8f0f7c2958371485b4a1ff64a8d1b > Author: Christoph Hellwig > Date: Tue Aug 28 13:58:24 2007 +1000 Following a git-bisect howto[0], I tried to revert this commit: # git checkout master # git revert 051e7cd44ab8f0f7c2958371485b4a1ff64a8d1b Auto-merged fs/xfs/linux-2.6/xfs_file.c CONFLICT (content): Merge conflict in fs/xfs/linux-2.6/xfs_file.c Auto-merged fs/xfs/linux-2.6/xfs_vnode.h CONFLICT (content): Merge conflict in fs/xfs/linux-2.6/xfs_vnode.h Auto-merged fs/xfs/xfs_dir2.c CONFLICT (content): Merge conflict in fs/xfs/xfs_dir2.c Auto-merged fs/xfs/xfs_dir2.h Auto-merged fs/xfs/xfs_dir2_block.c Auto-merged fs/xfs/xfs_dir2_sf.c Auto-merged fs/xfs/xfs_vnodeops.c CONFLICT (content): Merge conflict in fs/xfs/xfs_vnodeops.c Automatic revert failed. After resolving the conflicts, mark the corrected paths with 'git add ' and commit the result. Any ideas? Christian [0] is this still up-to-date? http://kernel.org/pub/software/scm/git/docs/v1.4.4.4/howto/isolate-bugs-with-bisect.txt -- BOFH excuse #423: It's not RFC-822 compliant. From owner-xfs@oss.sgi.com Wed Nov 14 14:54:41 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 14 Nov 2007 14:54:47 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.2 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from mail.g-house.de (ns2.g-housing.de [81.169.133.75]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAEMsdCX010799 for ; Wed, 14 Nov 2007 14:54:41 -0800 Received: from [89.54.143.169] (helo=[192.168.178.25]) by mail.g-house.de with esmtpsa (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.63) (envelope-from ) id 1IsQnw-0003XJ-0w; Wed, 14 Nov 2007 23:33:52 +0100 Date: Wed, 14 Nov 2007 23:31:12 +0100 (CET) From: Christian Kujau X-X-Sender: evil@sheep.housecafe.de To: "J. Bruce Fields" cc: Benny Halevy , Chris Wedgwood , linux-xfs@oss.sgi.com, LKML Subject: Re: 2.6.24-rc2 XFS nfsd hang In-Reply-To: <20071114125907.GB4010@fieldses.org> Message-ID: References: <20071114070400.GA25708@puku.stupidest.org> <473AA72C.6020308@panasas.com> <20071114125907.GB4010@fieldses.org> User-Agent: Alpine 0.99999 (DEB 796 2007-11-08) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: ClamAV 0.91.2/4794/Wed Nov 14 13:06:34 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13670 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lists@nerdbynature.de Precedence: bulk X-list: xfs On Wed, 14 Nov 2007, J. Bruce Fields wrote: > On Wed, Nov 14, 2007 at 09:43:40AM +0200, Benny Halevy wrote: >> I wonder if this is a similar hang to what Christian was seeing here: >> http://lkml.org/lkml/2007/11/13/319 > > Ah, thanks for noticing that. Christian Kujau, is /data an xfs > partition? Sorry for the late reply :\ Yes, the nfsd process only got stuck when I did ls(1) (with or without -l) on a NFS share which contained a XFS partition. I did not care for the underlying fs first so I just ls'ed my shares and noticed that it got stuck. Now that you mention it I tried again, with a (git-wise) current 2.6 kernel and the same .config: http://nerdbynature.de/bits/2.6.24-rc2/nfsd/ Running ls on a ext3 or jfs backed nfs share did succeed, running ls on an xfs backed nfs share did not. The sysrq-t (see dmesg.2.gz please) looks like yours (to my untrained eye): nfsd D c04131c0 0 8535 2 e7ea97b8 00000046 e7ea9000 c04131c0 e7ea97b8 e697e7e0 00000282 e697e7e8 e7ea97e4 c0409ebc f71f3500 00000001 f71f3500 c0115540 e697e804 e697e804 e697e7e0 8f082000 00000001 e7ea97f4 c0409cc2 00000004 00000062 e7ea9800 Nov 14 23:07:14 sheep kernel: [ 1870.124185] Call Trace: [] __down+0x7c/0xd0 [] __down_failed+0xa/0x10 [] xfs_buf_lock+0x46/0x50 [] _xfs_buf_find+0xf2/0x190 [] xfs_buf_get_flags+0x54/0x120 [] xfs_buf_read_flags+0x1d/0x80 [] xfs_trans_read_buf+0x4a/0x350 [] xfs_da_do_buf+0x409/0x760 [] xfs_da_read_buf+0x2f/0x40 [] xfs_dir2_leaf_lookup_int+0x172/0x270 [] xfs_dir2_leaf_lookup+0x1e/0x90 [] xfs_dir_lookup+0xe4/0x100 [] xfs_dir_lookup_int+0x2e/0x100 [] xfs_lookup+0x62/0x90 [] xfs_vn_lookup+0x34/0x70 [] __lookup_hash+0xb6/0x100 [] lookup_one_len+0x4e/0x50 [] compose_entry_fh+0x59/0x120 [nfsd] [] encode_entry+0x329/0x3c0 [nfsd] [] nfs3svc_encode_entry_plus+0x3b/0x50 [nfsd] [] xfs_dir2_leaf_getdents+0x174/0x900 [] xfs_readdir+0xba/0xd0 [] xfs_file_readdir+0x44/0x70 [] vfs_readdir+0x7e/0xa0 [] nfsd_readdir+0x73/0xe0 [nfsd] [] nfsd3_proc_readdirplus+0xda/0x200 [nfsd] [] nfsd_dispatch+0x11b/0x210 [nfsd] [] svc_process+0x41c/0x760 [sunrpc] [] nfsd+0x164/0x2a0 [nfsd] [] kernel_thread_helper+0x7/0x10 >> Any suggestions other than to bisect this? (Bisection might be >> painful as it crosses the x86-merge.) Make that "impossible" for me, as I could not boot the bisected kernel and marking versions as "bad" for unrelated things seems to invalidate the results. However, from ~2500 revisions (2.6.24-rc2 to 2.6.23.1) down to ~20 or so in just 10 builds, that's pretty awesome. Christian. -- BOFH excuse #321: Scheduled global CPU outage From owner-xfs@oss.sgi.com Wed Nov 14 19:26:09 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 14 Nov 2007 19:26:13 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from sandeen.net (sandeen.net [209.173.210.139]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAF3Q6F6008298 for ; Wed, 14 Nov 2007 19:26:08 -0800 Received: from liberator.sandeen.net (liberator.sandeen.net [10.0.0.4]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by sandeen.net (Postfix) with ESMTP id 3CB6D180286BE; Wed, 14 Nov 2007 21:26:13 -0600 (CST) Message-ID: <473BBC55.7050804@sandeen.net> Date: Wed, 14 Nov 2007 21:26:13 -0600 From: Eric Sandeen User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Jay Sullivan CC: xfs@oss.sgi.com Subject: Re: xfs_force_shutdown called from file fs/xfs/xfs_trans_buf.c References: <06CCEA2EB1B80A4A937ED59005FA855101AED622@svits26.main.ad.rit.edu> In-Reply-To: <06CCEA2EB1B80A4A937ED59005FA855101AED622@svits26.main.ad.rit.edu> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4796/Wed Nov 14 16:09:59 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13671 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: sandeen@sandeen.net Precedence: bulk X-list: xfs Jay Sullivan wrote: > Of course this had to happen one more time before my scheduled > maintenance window... Anyways, here's all of the good stuff I > collected. Can anyone make sense of it? Oh, and I upgraded to xfsprogs > 2.9.4 last week, so all output you see is with that version. > Forgot to ask, are you running w/ 4k stacks? And/or, do you have stack usage debugging enabled? That's quite a backtrace you've got there... just a shot in the dark. -Eric From owner-xfs@oss.sgi.com Wed Nov 14 19:34:10 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 14 Nov 2007 19:34:14 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.4 required=5.0 tests=AWL,BAYES_20,J_CHICKENPOX_62, J_CHICKENPOX_64 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAF3Xwlu010042 for ; Wed, 14 Nov 2007 19:34:03 -0800 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA19570; Thu, 15 Nov 2007 14:33:55 +1100 Message-ID: <473BBDC1.2020107@sgi.com> Date: Thu, 15 Nov 2007 14:32:17 +1100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.6 (X11/20070728) MIME-Version: 1.0 To: David Chinner CC: xfs-oss , xfs-dev Subject: Re: [PATCH, RFC] Move AIL pushing into a separate thread References: <20071105050706.GW66820511@sgi.com> In-Reply-To: <20071105050706.GW66820511@sgi.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4796/Wed Nov 14 16:09:59 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13672 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs Overall it looks good Dave, just a few comments below. David Chinner wrote: > When many hundreds to thousands of threads all try to do simultaneous > transactions and the log is in a tail-pushing situation (i.e. full), > we can get multiple threads walking the AIL list and contending on > the AIL lock. > > Recently wevve had two cases of machines basically locking up because > most of the CPUs in the system are trying to obtain the AIL lock. > The first was an 8p machine with ~2,500 kernel threads trying to > do transactions, and the latest is a 2048p altix closing a file per > MPI rank in a synchronised fashion resulting in > 400 processes > all trying to walk and push the AIL at the same time. > > The AIL push is, in effect, a simple I/O dispatch algorithm complicated > by the ordering constraints placed on it by the transaction subsystem. > It really does not need multiple threads to push on it - even when > only a single CPU is pushing the AIL, it can push the I/O out far faster > that pretty much any disk subsystem can handle. > > So, to avoid contention problems stemming from multiple list walkers, > move the list walk off into another thread and simply provide a "target" > to push to. When a thread requires a push, it sets the target and wakes > the push thread, then goes to sleep waiting for the required amount > of space to become available in the log. > > This mechanism should also be a lot fairer under heavy load as the > waiters will queue in arrival order, rather than queuing in "who completed > a push first" order. > > Also, by moving the pushing to a separate thread we can do more effectively > overload detection and prevention as we can keep context from loop iteration > to loop iteration. That is, we can push only part of the list each loop and not > have to loop back to the start of the list every time we run. This should > also help by reducing the number of items we try to lock and/or push items > that we cannot move. > > Note that this patch is not intended to solve the inefficiencies in the > AIL structure and the associated issues with extremely large list contents. > That needs to be addresses separately; parallel access would cause problems > to any new structure as well, so I'm only aiming to isolate the structure > from unbounded parallelism here. > > Signed-Off-By: Dave Chinner > --- > fs/xfs/linux-2.6/xfs_super.c | 60 +++++++++++ > fs/xfs/xfs_log.c | 12 ++ > fs/xfs/xfs_mount.c | 6 - > fs/xfs/xfs_mount.h | 10 + > fs/xfs/xfs_trans.h | 1 > fs/xfs/xfs_trans_ail.c | 231 ++++++++++++++++++++++++++++--------------- > fs/xfs/xfs_trans_priv.h | 8 + > fs/xfs/xfsidbg.c | 12 +- > 8 files changed, 247 insertions(+), 93 deletions(-) > > Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_super.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_super.c 2007-11-05 10:39:05.000000000 +1100 > +++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_super.c 2007-11-05 14:48:39.871177707 +1100 > @@ -51,6 +51,7 @@ > #include "xfs_vfsops.h" > #include "xfs_version.h" > #include "xfs_log_priv.h" > +#include "xfs_trans_priv.h" > > #include > #include > @@ -765,6 +766,65 @@ xfs_blkdev_issue_flush( > blkdev_issue_flush(buftarg->bt_bdev, NULL); > } > > +/* > + * XFS AIL push thread support > + */ > +void > +xfsaild_wakeup( > + xfs_mount_t *mp, > + xfs_lsn_t threshold_lsn) > +{ > + > + if (XFS_LSN_CMP(threshold_lsn, mp->m_ail.xa_target) > 0) { > + mp->m_ail.xa_target = threshold_lsn; > + wake_up_process(mp->m_ail.xa_task); > + } > +} > + > +int > +xfsaild( > + void *data) > +{ > + xfs_mount_t *mp = (xfs_mount_t *)data; > + xfs_lsn_t last_pushed_lsn = 0; > + long tout = 0; > + > + while (!kthread_should_stop()) { > + if (tout) > + schedule_timeout_interruptible(msecs_to_jiffies(tout)); > + > + /* swsusp */ > + try_to_freeze(); > + > + /* we're either starting or stopping if there is no log */ > + if (!mp->m_log) > + continue; It's looks like the log should never be NULL while the xfsaild thread is running. Could we ASSERT(mp->m_log)? > + > + tout = xfsaild_push(mp, &last_pushed_lsn); > + } > + > + return 0; > +} /* xfsaild */ > + > +void > +xfsaild_start( > + xfs_mount_t *mp) > +{ > + mp->m_ail.xa_target = 0; > + mp->m_ail.xa_task = kthread_run(xfsaild, mp, "xfsaild"); > + ASSERT(!IS_ERR(mp->m_ail.xa_task)); > + /* XXX: should return error but nowhere to do it */ > +} > + > +void > +xfsaild_stop( > + xfs_mount_t *mp) > +{ > + kthread_stop(mp->m_ail.xa_task); > +} > + > + > + > STATIC struct inode * > xfs_fs_alloc_inode( > struct super_block *sb) > Index: 2.6.x-xfs-new/fs/xfs/xfs_log.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_log.c 2007-11-02 18:00:19.000000000 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfs_log.c 2007-11-05 14:07:16.850189316 +1100 > @@ -515,6 +515,12 @@ xfs_log_mount(xfs_mount_t *mp, > mp->m_log = xlog_alloc_log(mp, log_target, blk_offset, num_bblks); > > /* > + * Initialize the AIL now we have a log. > + */ > + spin_lock_init(&mp->m_ail_lock); > + xfs_trans_ail_init(mp); > + > + /* > * skip log recovery on a norecovery mount. pretend it all > * just worked. > */ > @@ -530,7 +536,7 @@ xfs_log_mount(xfs_mount_t *mp, > mp->m_flags |= XFS_MOUNT_RDONLY; > if (error) { > cmn_err(CE_WARN, "XFS: log mount/recovery failed: error %d", error); > - xlog_dealloc_log(mp->m_log); > + xfs_log_unmount_dealloc(mp); > return error; > } > } > @@ -722,10 +728,14 @@ xfs_log_unmount_write(xfs_mount_t *mp) > > /* > * Deallocate log structures for unmount/relocation. > + * > + * We need to stop the aild from running before we destroy > + * and deallocate the log as the aild references the log. > */ > void > xfs_log_unmount_dealloc(xfs_mount_t *mp) > { > + xfs_trans_ail_destroy(mp); > xlog_dealloc_log(mp->m_log); > } > > Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.c 2007-11-02 13:44:50.000000000 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfs_mount.c 2007-11-05 14:12:22.554601173 +1100 > @@ -137,15 +137,9 @@ xfs_mount_init(void) > mp->m_flags |= XFS_MOUNT_NO_PERCPU_SB; > } > > - spin_lock_init(&mp->m_ail_lock); > spin_lock_init(&mp->m_sb_lock); > mutex_init(&mp->m_ilock); > mutex_init(&mp->m_growlock); > - /* > - * Initialize the AIL. > - */ > - xfs_trans_ail_init(mp); > - > atomic_set(&mp->m_active_trans, 0); > > return mp; > Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.h > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.h 2007-10-16 08:52:58.000000000 +1000 > +++ 2.6.x-xfs-new/fs/xfs/xfs_mount.h 2007-11-05 14:14:42.652456849 +1100 > @@ -219,12 +219,18 @@ extern void xfs_icsb_sync_counters_flags > #define xfs_icsb_sync_counters_flags(mp, flags) do { } while (0) > #endif > > +typedef struct xfs_ail { > + xfs_ail_entry_t xa_ail; > + uint xa_gen; > + struct task_struct *xa_task; > + xfs_lsn_t xa_target; > +} xfs_ail_t; > + > typedef struct xfs_mount { > struct super_block *m_super; > xfs_tid_t m_tid; /* next unused tid for fs */ > spinlock_t m_ail_lock; /* fs AIL mutex */ > - xfs_ail_entry_t m_ail; /* fs active log item list */ > - uint m_ail_gen; /* fs AIL generation count */ > + xfs_ail_t m_ail; /* fs active log item list */ > xfs_sb_t m_sb; /* copy of fs superblock */ > spinlock_t m_sb_lock; /* sb counter lock */ > struct xfs_buf *m_sb_bp; /* buffer for superblock */ > Index: 2.6.x-xfs-new/fs/xfs/xfs_trans.h > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_trans.h 2007-11-02 13:44:46.000000000 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfs_trans.h 2007-11-05 14:01:13.205272667 +1100 > @@ -993,6 +993,7 @@ int _xfs_trans_commit(xfs_trans_t *, > #define xfs_trans_commit(tp, flags) _xfs_trans_commit(tp, flags, NULL) > void xfs_trans_cancel(xfs_trans_t *, int); > void xfs_trans_ail_init(struct xfs_mount *); > +void xfs_trans_ail_destroy(struct xfs_mount *); > xfs_lsn_t xfs_trans_push_ail(struct xfs_mount *, xfs_lsn_t); > xfs_lsn_t xfs_trans_tail_ail(struct xfs_mount *); > void xfs_trans_unlocked_item(struct xfs_mount *, > Index: 2.6.x-xfs-new/fs/xfs/xfs_trans_ail.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_trans_ail.c 2007-10-02 16:01:48.000000000 +1000 > +++ 2.6.x-xfs-new/fs/xfs/xfs_trans_ail.c 2007-11-05 14:46:44.206327966 +1100 > @@ -57,7 +57,7 @@ xfs_trans_tail_ail( > xfs_log_item_t *lip; > > spin_lock(&mp->m_ail_lock); > - lip = xfs_ail_min(&(mp->m_ail)); > + lip = xfs_ail_min(&(mp->m_ail.xa_ail)); > if (lip == NULL) { > lsn = (xfs_lsn_t)0; > } else { > @@ -71,25 +71,22 @@ xfs_trans_tail_ail( > /* > * xfs_trans_push_ail > * > - * This routine is called to move the tail of the AIL > - * forward. It does this by trying to flush items in the AIL > - * whose lsns are below the given threshold_lsn. > + * This routine is called to move the tail of the AIL forward. It does this by > + * trying to flush items in the AIL whose lsns are below the given > + * threshold_lsn. > * > - * The routine returns the lsn of the tail of the log. > + * the push is run asynchronously in a separate thread, so we return the tail > + * of the log right now instead of the tail after the push. This means we will > + * either continue right away, or we will sleep waiting on the async thread to > + * do it's work. > */ > xfs_lsn_t > xfs_trans_push_ail( > xfs_mount_t *mp, > xfs_lsn_t threshold_lsn) > { > - xfs_lsn_t lsn; > xfs_log_item_t *lip; > int gen; > - int restarts; > - int lock_result; > - int flush_log; > - > -#define XFS_TRANS_PUSH_AIL_RESTARTS 1000 > > spin_lock(&mp->m_ail_lock); > lip = xfs_trans_first_ail(mp, &gen); > @@ -100,57 +97,105 @@ xfs_trans_push_ail( > spin_unlock(&mp->m_ail_lock); > return (xfs_lsn_t)0; > } > + if (XFS_LSN_CMP(threshold_lsn, mp->m_ail.xa_target) > 0) Is this conditional necessary? Can we just call xfsaild_wakeup() and let it do the same thing? > + xfsaild_wakeup(mp, threshold_lsn); > + spin_unlock(&mp->m_ail_lock); > + return (xfs_lsn_t)lip->li_lsn; > +} > + > +/* > + * Return the item in the AIL with the current lsn. > + * Return the current tree generation number for use > + * in calls to xfs_trans_next_ail(). > + */ > +STATIC xfs_log_item_t * > +xfs_trans_first_push_ail( > + xfs_mount_t *mp, > + int *gen, > + xfs_lsn_t lsn) > +{ > + xfs_log_item_t *lip; > + > + lip = xfs_ail_min(&(mp->m_ail.xa_ail)); > + *gen = (int)mp->m_ail.xa_gen; > + while (lip && (XFS_LSN_CMP(lip->li_lsn, lsn) < 0)) > + lip = lip->li_ail.ail_forw; > + > + return (lip); > +} > + > +/* > + * Function that does the work of pushing on the AIL > + */ > +long > +xfsaild_push( > + xfs_mount_t *mp, > + xfs_lsn_t *last_lsn) > +{ > + long tout = 100; /* milliseconds */ > + xfs_lsn_t last_pushed_lsn = *last_lsn; > + xfs_lsn_t target = mp->m_ail.xa_target; > + xfs_lsn_t lsn; > + xfs_log_item_t *lip; > + int lock_result; > + int gen; > + int restarts; restarts needs to be initialised > + int flush_log, count, stuck; > + > +#define XFS_TRANS_PUSH_AIL_RESTARTS 10 > + > + spin_lock(&mp->m_ail_lock); > + lip = xfs_trans_first_push_ail(mp, &gen, *last_lsn); > + if (lip == NULL || XFS_FORCED_SHUTDOWN(mp)) { > + /* > + * AIL is empty or our push has reached the end. > + */ > + spin_unlock(&mp->m_ail_lock); > + last_pushed_lsn = 0; > + goto out; > + } > > XFS_STATS_INC(xs_push_ail); > > /* > * While the item we are looking at is below the given threshold > - * try to flush it out. Make sure to limit the number of times > - * we allow xfs_trans_next_ail() to restart scanning from the > - * beginning of the list. We'd like not to stop until we've at least > + * try to flush it out. We'd like not to stop until we've at least > * tried to push on everything in the AIL with an LSN less than > - * the given threshold. However, we may give up before that if > - * we realize that we've been holding the AIL lock for 'too long', > - * blocking interrupts. Currently, too long is < 500us roughly. > + * the given threshold. > + * > + * However, we will stop after a certain number of pushes and wait > + * for a reduced timeout to fire before pushing further. This > + * prevents use from spinning when we can't do anything or there is > + * lots of contention on the AIL lists. > */ > - flush_log = 0; > - restarts = 0; > - while (((restarts < XFS_TRANS_PUSH_AIL_RESTARTS) && > - (XFS_LSN_CMP(lip->li_lsn, threshold_lsn) < 0))) { > + tout = 10; > + lsn = lip->li_lsn; > + flush_log = stuck = count = 0; > + while ((XFS_LSN_CMP(lip->li_lsn, target) < 0)) { > /* > - * If we can lock the item without sleeping, unlock > - * the AIL lock and flush the item. Then re-grab the > - * AIL lock so we can look for the next item on the > - * AIL. Since we unlock the AIL while we flush the > - * item, the next routine may start over again at the > - * the beginning of the list if anything has changed. > - * That is what the generation count is for. > + * If we can lock the item without sleeping, unlock the AIL > + * lock and flush the item. Then re-grab the AIL lock so we > + * can look for the next item on the AIL. List changes are > + * handled by the AIL lookup functions internally > * > - * If we can't lock the item, either its holder will flush > - * it or it is already being flushed or it is being relogged. > - * In any of these case it is being taken care of and we > - * can just skip to the next item in the list. > + * If we can't lock the item, either its holder will flush it > + * or it is already being flushed or it is being relogged. In > + * any of these case it is being taken care of and we can just > + * skip to the next item in the list. > */ > lock_result = IOP_TRYLOCK(lip); > + spin_unlock(&mp->m_ail_lock); > switch (lock_result) { > case XFS_ITEM_SUCCESS: > - spin_unlock(&mp->m_ail_lock); > XFS_STATS_INC(xs_push_ail_success); > IOP_PUSH(lip); > - spin_lock(&mp->m_ail_lock); > + last_pushed_lsn = lsn; > break; > > case XFS_ITEM_PUSHBUF: > - spin_unlock(&mp->m_ail_lock); > XFS_STATS_INC(xs_push_ail_pushbuf); > -#ifdef XFSRACEDEBUG > - delay_for_intr(); > - delay(300); > -#endif > - ASSERT(lip->li_ops->iop_pushbuf); > - ASSERT(lip); > IOP_PUSHBUF(lip); > - spin_lock(&mp->m_ail_lock); > + last_pushed_lsn = lsn; > break; > > case XFS_ITEM_PINNED: > @@ -160,10 +205,14 @@ xfs_trans_push_ail( > > case XFS_ITEM_LOCKED: > XFS_STATS_INC(xs_push_ail_locked); > + last_pushed_lsn = lsn; > + stuck++; > break; > > case XFS_ITEM_FLUSHING: > XFS_STATS_INC(xs_push_ail_flushing); > + last_pushed_lsn = lsn; > + stuck++; > break; > > default: > @@ -171,19 +220,26 @@ xfs_trans_push_ail( > break; > } > > + spin_lock(&mp->m_ail_lock); > + count++; > + /* Too many items we can't do anything with? */ > + if (stuck > 100) 100? Arbitrary magic number or was there reason for this? > + break; > + /* we're either starting or stopping if there is no log */ > + if (!mp->m_log) Again, can we ASSERT(mp->m_log)? > + break; > + /* should we bother continuing? */ > + if (XFS_FORCED_SHUTDOWN(mp)) > + break; > + /* get the next item */ > lip = xfs_trans_next_ail(mp, lip, &gen, &restarts); > - if (lip == NULL) { > + if (lip == NULL) > break; > - } > - if (XFS_FORCED_SHUTDOWN(mp)) { > - /* > - * Just return if we shut down during the last try. > - */ > - spin_unlock(&mp->m_ail_lock); > - return (xfs_lsn_t)0; > - } > - > + if (restarts > XFS_TRANS_PUSH_AIL_RESTARTS) > + break; > + lsn = lip->li_lsn; > } > + spin_unlock(&mp->m_ail_lock); > > if (flush_log) { > /* > @@ -191,22 +247,33 @@ xfs_trans_push_ail( > * push out the log so it will become unpinned and > * move forward in the AIL. > */ > - spin_unlock(&mp->m_ail_lock); > XFS_STATS_INC(xs_push_ail_flush); > xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE); > - spin_lock(&mp->m_ail_lock); > } > > - lip = xfs_ail_min(&(mp->m_ail)); > - if (lip == NULL) { > - lsn = (xfs_lsn_t)0; > - } else { > - lsn = lip->li_lsn; > + /* > + * We reached the target so wait a bit longer for I/O to complete and > + * remove pushed items from the AIL before we start the next scan from > + * the start of the AIL. > + */ > + if ((XFS_LSN_CMP(lsn, target) >= 0)) { > + tout += 20; > + last_pushed_lsn = 0; > + } else if ((restarts > XFS_TRANS_PUSH_AIL_RESTARTS) || > + (count && (count < (stuck + 10)))) { If 0 < count < 10 and stuck == 0 then we'll think we couldn't flush something - not sure if that is what you intended here. Maybe ((count - stuck) < stuck) ? ie the number of items we successfully flushed is less than the number of items we couldn't flush then back off. > + /* > + * Either there is a lot of contention on the AIL or we > + * found a lot of items we couldn't do anything with. > + * Backoff a bit more to allow some I/O to complete before > + * continuing from where we were. > + */ > + tout += 10; > } > > - spin_unlock(&mp->m_ail_lock); > - return lsn; > -} /* xfs_trans_push_ail */ > +out: > + *last_lsn = last_pushed_lsn; > + return tout; > +} /* xfsaild_push */ > > > /* > @@ -247,7 +314,7 @@ xfs_trans_unlocked_item( > * the call to xfs_log_move_tail() doesn't do anything if there's > * not enough free space to wake people up so we're safe calling it. > */ > - min_lip = xfs_ail_min(&mp->m_ail); > + min_lip = xfs_ail_min(&mp->m_ail.xa_ail); > > if (min_lip == lip) > xfs_log_move_tail(mp, 1); > @@ -279,7 +346,7 @@ xfs_trans_update_ail( > xfs_log_item_t *dlip=NULL; > xfs_log_item_t *mlip; /* ptr to minimum lip */ > > - ailp = &(mp->m_ail); > + ailp = &(mp->m_ail.xa_ail); > mlip = xfs_ail_min(ailp); > > if (lip->li_flags & XFS_LI_IN_AIL) { > @@ -292,10 +359,10 @@ xfs_trans_update_ail( > lip->li_lsn = lsn; > > xfs_ail_insert(ailp, lip); > - mp->m_ail_gen++; > + mp->m_ail.xa_gen++; > > if (mlip == dlip) { > - mlip = xfs_ail_min(&(mp->m_ail)); > + mlip = xfs_ail_min(&(mp->m_ail.xa_ail)); > spin_unlock(&mp->m_ail_lock); > xfs_log_move_tail(mp, mlip->li_lsn); > } else { > @@ -330,7 +397,7 @@ xfs_trans_delete_ail( > xfs_log_item_t *mlip; > > if (lip->li_flags & XFS_LI_IN_AIL) { > - ailp = &(mp->m_ail); > + ailp = &(mp->m_ail.xa_ail); > mlip = xfs_ail_min(ailp); > dlip = xfs_ail_delete(ailp, lip); > ASSERT(dlip == lip); > @@ -338,10 +405,10 @@ xfs_trans_delete_ail( > > lip->li_flags &= ~XFS_LI_IN_AIL; > lip->li_lsn = 0; > - mp->m_ail_gen++; > + mp->m_ail.xa_gen++; > > if (mlip == dlip) { > - mlip = xfs_ail_min(&(mp->m_ail)); > + mlip = xfs_ail_min(&(mp->m_ail.xa_ail)); > spin_unlock(&mp->m_ail_lock); > xfs_log_move_tail(mp, (mlip ? mlip->li_lsn : 0)); > } else { > @@ -379,10 +446,10 @@ xfs_trans_first_ail( > { > xfs_log_item_t *lip; > > - lip = xfs_ail_min(&(mp->m_ail)); > - *gen = (int)mp->m_ail_gen; > + lip = xfs_ail_min(&(mp->m_ail.xa_ail)); > + *gen = (int)mp->m_ail.xa_gen; > > - return (lip); > + return lip; > } > > /* > @@ -402,11 +469,11 @@ xfs_trans_next_ail( > xfs_log_item_t *nlip; > > ASSERT(mp && lip && gen); > - if (mp->m_ail_gen == *gen) { > - nlip = xfs_ail_next(&(mp->m_ail), lip); > + if (mp->m_ail.xa_gen == *gen) { > + nlip = xfs_ail_next(&(mp->m_ail.xa_ail), lip); > } else { > - nlip = xfs_ail_min(&(mp->m_ail)); > - *gen = (int)mp->m_ail_gen; > + nlip = xfs_ail_min(&(mp->m_ail).xa_ail); > + *gen = (int)mp->m_ail.xa_gen; > if (restarts != NULL) { > XFS_STATS_INC(xs_push_ail_restarts); > (*restarts)++; > @@ -435,8 +502,16 @@ void > xfs_trans_ail_init( > xfs_mount_t *mp) > { > - mp->m_ail.ail_forw = (xfs_log_item_t*)&(mp->m_ail); > - mp->m_ail.ail_back = (xfs_log_item_t*)&(mp->m_ail); > + mp->m_ail.xa_ail.ail_forw = (xfs_log_item_t*)&mp->m_ail.xa_ail; > + mp->m_ail.xa_ail.ail_back = (xfs_log_item_t*)&mp->m_ail.xa_ail; > + xfsaild_start(mp); > +} > + > +void > +xfs_trans_ail_destroy( > + xfs_mount_t *mp) > +{ > + xfsaild_stop(mp); > } > > /* > Index: 2.6.x-xfs-new/fs/xfs/xfs_trans_priv.h > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_trans_priv.h 2007-10-02 16:01:48.000000000 +1000 > +++ 2.6.x-xfs-new/fs/xfs/xfs_trans_priv.h 2007-11-05 14:02:18.784782356 +1100 > @@ -57,4 +57,12 @@ struct xfs_log_item *xfs_trans_next_ail( > struct xfs_log_item *, int *, int *); > > > +/* > + * AIL push thread support > + */ > +long xfsaild_push(struct xfs_mount *, xfs_lsn_t *); > +void xfsaild_wakeup(struct xfs_mount *, xfs_lsn_t); > +void xfsaild_start(struct xfs_mount *); > +void xfsaild_stop(struct xfs_mount *); > + > #endif /* __XFS_TRANS_PRIV_H__ */ > Index: 2.6.x-xfs-new/fs/xfs/xfsidbg.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfsidbg.c 2007-11-02 13:44:50.000000000 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfsidbg.c 2007-11-05 14:50:43.099049624 +1100 > @@ -6220,13 +6220,13 @@ xfsidbg_xaildump(xfs_mount_t *mp) > }; > int count; > > - if ((mp->m_ail.ail_forw == NULL) || > - (mp->m_ail.ail_forw == (xfs_log_item_t *)&mp->m_ail)) { > + if ((mp->m_ail.xa_ail.ail_forw == NULL) || > + (mp->m_ail.xa_ail.ail_forw == (xfs_log_item_t *)&mp->m_ail.xa_ail)) { > kdb_printf("AIL is empty\n"); > return; > } > kdb_printf("AIL for mp 0x%p, oldest first\n", mp); > - lip = (xfs_log_item_t*)mp->m_ail.ail_forw; > + lip = (xfs_log_item_t*)mp->m_ail.xa_ail.ail_forw; > for (count = 0; lip; count++) { > kdb_printf("[%d] type %s ", count, xfsidbg_item_type_str(lip)); > printflags((uint)(lip->li_flags), li_flags, "flags:"); > @@ -6255,7 +6255,7 @@ xfsidbg_xaildump(xfs_mount_t *mp) > break; > } > > - if (lip->li_ail.ail_forw == (xfs_log_item_t*)&mp->m_ail) { > + if (lip->li_ail.ail_forw == (xfs_log_item_t*)&mp->m_ail.xa_ail) { > lip = NULL; > } else { > lip = lip->li_ail.ail_forw; > @@ -6312,9 +6312,9 @@ xfsidbg_xmount(xfs_mount_t *mp) > > kdb_printf("xfs_mount at 0x%p\n", mp); > kdb_printf("tid 0x%x ail_lock 0x%p &ail 0x%p\n", > - mp->m_tid, &mp->m_ail_lock, &mp->m_ail); > + mp->m_tid, &mp->m_ail_lock, &mp->m_ail.xa_ail); > kdb_printf("ail_gen 0x%x &sb 0x%p\n", > - mp->m_ail_gen, &mp->m_sb); > + mp->m_ail.xa_gen, &mp->m_sb); > kdb_printf("sb_lock 0x%p sb_bp 0x%p dev 0x%x logdev 0x%x rtdev 0x%x\n", > &mp->m_sb_lock, mp->m_sb_bp, > mp->m_ddev_targp ? mp->m_ddev_targp->bt_dev : 0, > > > From owner-xfs@oss.sgi.com Wed Nov 14 22:38:05 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 14 Nov 2007 22:38:10 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_63, J_CHICKENPOX_66 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAF6bvk7000500 for ; Wed, 14 Nov 2007 22:38:01 -0800 Received: from pc-bnaujok.melbourne.sgi.com (pc-bnaujok.melbourne.sgi.com [134.14.55.58]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA24786; Thu, 15 Nov 2007 17:38:03 +1100 Date: Thu, 15 Nov 2007 17:40:41 +1100 To: "xfs@oss.sgi.com" , xfs-dev Subject: [REVIEW] Refactor xfs_repair's process_dinode_int From: "Barry Naujok" Organization: SGI Content-Type: multipart/mixed; boundary=----------LgAlWvtlkAZ8Uw20CyzBdz MIME-Version: 1.0 Message-ID: User-Agent: Opera Mail/9.24 (Win32) X-Virus-Scanned: ClamAV 0.91.2/4796/Wed Nov 14 16:09:59 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13673 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bnaujok@sgi.com Precedence: bulk X-list: xfs ------------LgAlWvtlkAZ8Uw20CyzBdz Content-Type: text/plain; format=flowed; delsp=yes; charset=utf-8 Content-Transfer-Encoding: 7bit Implementing casefold-table checking in xfs_repair, I have to touch process_dinode_int. It's a horrendous function. The attached patch hopefully makes it much clearer what it does and removes a lot of duplicate code when bad inodes are found. There are some obscure bug fixes too (eg. two places where the inode's di_mode is updated, but not marked dirty - libxfs would have tossed it). The refactoring involved removing unused variables, working out what various variables actually did and use them appropriately and break blocks of functionality into separate functions. Barry. ------------LgAlWvtlkAZ8Uw20CyzBdz Content-Disposition: attachment; filename=dinode.patch Content-Type: text/x-patch; name=dinode.patch Content-Transfer-Encoding: Quoted-Printable =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D xfsprogs/repair/dino_chunks.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- a/xfsprogs/repair/dino_chunks.c 2007-11-15 17:24:33.000000000 +1100 +++ b/xfsprogs/repair/dino_chunks.c 2007-11-14 15:41:03.188152397 +1100 @@ -593,7 +593,6 @@ process_inode_chunk( xfs_agino_t agino; xfs_agblock_t agbno; int dirty =3D 0; - int cleared =3D 0; int isa_dir =3D 0; int blks_per_cluster; int cluster_count; @@ -777,8 +776,7 @@ process_inode_chunk( =20 status =3D process_dinode(mp, dino, agno, agino, is_inode_free(ino_rec, irec_offset), - &ino_dirty, &cleared, &is_used, - ino_discovery, check_dups, + &ino_dirty, &is_used,ino_discovery, check_dups, extra_attr_check, &isa_dir, &parent); =20 ASSERT(is_used !=3D 3); =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D xfsprogs/repair/dinode.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- a/xfsprogs/repair/dinode.c 2007-11-15 17:24:33.000000000 +1100 +++ b/xfsprogs/repair/dinode.c 2007-11-15 17:23:49.322691248 +1100 @@ -58,9 +58,6 @@ calc_attr_offset(xfs_mount_t *mp, xfs_di case XFS_DINODE_FMT_LOCAL: offset +=3D INT_GET(dinoc->di_size, ARCH_CONVERT); break; - case XFS_DINODE_FMT_UUID: - offset +=3D sizeof(uuid_t); - break; case XFS_DINODE_FMT_EXTENTS: offset +=3D INT_GET(dinoc->di_nextents, ARCH_CONVERT) * sizeof(xfs_bmbt_= rec_32_t); break; @@ -1563,8 +1560,11 @@ null_check(char *name, int length) * bogus */ int -process_symlink(xfs_mount_t *mp, xfs_ino_t lino, xfs_dinode_t *dino, - blkmap_t *blkmap) +process_symlink( + xfs_mount_t *mp, + xfs_ino_t lino, + xfs_dinode_t *dino, + blkmap_t *blkmap) { xfs_dfsbno_t fsbno; xfs_dinode_core_t *dinoc =3D &dino->di_core; @@ -1673,8 +1673,7 @@ process_symlink(xfs_mount_t *mp, xfs_ino * called to process the set of misc inode special inode types * that have no associated data storage (fifos, pipes, devices, etc.). */ -/* ARGSUSED */ -int +static int process_misc_ino_types(xfs_mount_t *mp, xfs_dinode_t *dino, xfs_ino_t lino, @@ -1693,27 +1692,27 @@ process_misc_ino_types(xfs_mount_t *mp, /* * must also have a zero size */ - if (INT_GET(dino->di_core.di_size, ARCH_CONVERT) !=3D 0) { + if (dino->di_core.di_size !=3D 0) { switch (type) { case XR_INO_CHRDEV: do_warn(_("size of character device inode %llu !=3D 0 " "(%lld bytes)\n"), lino, - INT_GET(dino->di_core.di_size, ARCH_CONVERT)); + be64_to_cpu(dino->di_core.di_size)); break; case XR_INO_BLKDEV: do_warn(_("size of block device inode %llu !=3D 0 " "(%lld bytes)\n"), lino, - INT_GET(dino->di_core.di_size, ARCH_CONVERT)); + be64_to_cpu(dino->di_core.di_size)); break; case XR_INO_SOCK: do_warn(_("size of socket inode %llu !=3D 0 " "(%lld bytes)\n"), lino, - INT_GET(dino->di_core.di_size, ARCH_CONVERT)); + be64_to_cpu(dino->di_core.di_size)); break; case XR_INO_FIFO: do_warn(_("size of fifo inode %llu !=3D 0 " "(%lld bytes)\n"), lino, - INT_GET(dino->di_core.di_size, ARCH_CONVERT)); + be64_to_cpu(dino->di_core.di_size)); break; default: do_warn(_("Internal error - process_misc_ino_types, " @@ -1769,712 +1768,393 @@ process_misc_ino_types_blocks(xfs_drfsbn return (0); } =20 -/* - * returns 0 if the inode is ok, 1 if the inode is corrupt - * check_dups can be set to 1 *only* when called by the - * first pass of the duplicate block checking of phase 4. - * *dirty is set > 0 if the dinode has been altered and - * needs to be written out. - * - * for detailed, info, look at process_dinode() comments. - */ -/* ARGSUSED */ -int -process_dinode_int(xfs_mount_t *mp, - xfs_dinode_t *dino, - xfs_agnumber_t agno, - xfs_agino_t ino, - int was_free, /* 1 if inode is currently free */ - int *dirty, /* out =3D=3D > 0 if inode is now dirty */ - int *cleared, /* out =3D=3D 1 if inode was cleared */ - int *used, /* out =3D=3D 1 if inode is in use */ - int verify_mode, /* 1 =3D=3D verify but don't modify inode */ - int uncertain, /* 1 =3D=3D inode is uncertain */ - int ino_discovery, /* 1 =3D=3D check dirs for unknown inodes */ - int check_dups, /* 1 =3D=3D check if inode claims - * duplicate blocks */ - int extra_attr_check, /* 1 =3D=3D do attribute format and value checks */ - int *isa_dir, /* out =3D=3D 1 if inode is a directory */ - xfs_ino_t *parent) /* out -- parent if ino is a dir */ +static inline int +dinode_fmt( + xfs_dinode_core_t *dinoc) { - xfs_drfsbno_t totblocks =3D 0; - xfs_drfsbno_t atotblocks =3D 0; - xfs_dinode_core_t *dinoc; - char *rstring; - int type; - int rtype; - int do_rt; - int err; - int retval =3D 0; - __uint64_t nextents; - __uint64_t anextents; - xfs_ino_t lino; - const int is_free =3D 0; - const int is_used =3D 1; - int repair =3D 0; - blkmap_t *ablkmap =3D NULL; - blkmap_t *dblkmap =3D NULL; - static char okfmts[] =3D { - 0, /* free inode */ - 1 << XFS_DINODE_FMT_DEV, /* FIFO */ - 1 << XFS_DINODE_FMT_DEV, /* CHR */ - 0, /* type 3 unused */ - (1 << XFS_DINODE_FMT_LOCAL) | - (1 << XFS_DINODE_FMT_EXTENTS) | - (1 << XFS_DINODE_FMT_BTREE), /* DIR */ - 0, /* type 5 unused */ - 1 << XFS_DINODE_FMT_DEV, /* BLK */ - 0, /* type 7 unused */ - (1 << XFS_DINODE_FMT_EXTENTS) | - (1 << XFS_DINODE_FMT_BTREE), /* REG */ - 0, /* type 9 unused */ - (1 << XFS_DINODE_FMT_LOCAL) | - (1 << XFS_DINODE_FMT_EXTENTS), /* LNK */ - 0, /* type 11 unused */ - 1 << XFS_DINODE_FMT_DEV, /* SOCK */ - 0, /* type 13 unused */ - 1 << XFS_DINODE_FMT_UUID, /* MNT */ - 0 /* type 15 unused */ - }; - - retval =3D 0; - totblocks =3D atotblocks =3D 0; - *dirty =3D *isa_dir =3D *cleared =3D 0; - *used =3D is_used; - type =3D rtype =3D XR_INO_UNKNOWN; - rstring =3D NULL; - do_rt =3D 0; + return be16_to_cpu(dinoc->di_mode) & S_IFMT; +} =20 - dinoc =3D &dino->di_core; - lino =3D XFS_AGINO_TO_INO(mp, agno, ino); +static inline void +change_dinode_fmt( + xfs_dinode_core_t *dinoc, + int new_fmt) +{ + int mode =3D be16_to_cpu(dinoc->di_mode); =20 - /* - * if in verify mode, don't modify the inode. - * - * if correcting, reset stuff that has known values - * - * if in uncertain mode, be silent on errors since we're - * trying to find out if these are inodes as opposed - * to assuming that they are. Just return the appropriate - * return code in that case. - */ + mode &=3D ~S_IFMT; + mode |=3D new_fmt; + dinoc->di_mode =3D cpu_to_be16(mode); +} =20 - if (INT_GET(dinoc->di_magic, ARCH_CONVERT) !=3D XFS_DINODE_MAGIC) { - retval++; - if (!verify_mode) { - do_warn(_("bad magic number 0x%x on inode %llu, "), - INT_GET(dinoc->di_magic, ARCH_CONVERT), lino); +static int +check_dinode_mode_format( + xfs_dinode_core_t *dinoc) +{ + if ((uchar_t)dinoc->di_format >=3D XFS_DINODE_FMT_UUID) + return -1; /* FMT_UUID is not used */ + + switch (dinode_fmt(dinoc)) { + case S_IFIFO: + case S_IFCHR: + case S_IFBLK: + case S_IFSOCK: + return (dinoc->di_format !=3D XFS_DINODE_FMT_DEV) ? -1 : 0; + + case S_IFDIR: + return (dinoc->di_format < XFS_DINODE_FMT_LOCAL || + dinoc->di_format > XFS_DINODE_FMT_BTREE) ? -1 : 0; + + case S_IFREG: + return (dinoc->di_format < XFS_DINODE_FMT_EXTENTS || + dinoc->di_format > XFS_DINODE_FMT_BTREE) ? -1 : 0; + + case S_IFLNK: + return (dinoc->di_format < XFS_DINODE_FMT_LOCAL || + dinoc->di_format > XFS_DINODE_FMT_EXTENTS) ? -1 : 0; + + default: ; + } + return 0; /* invalid modes are checked elsewhere */ +} + +/* + * If inode is a superblock inode, does type check to make sure is it vali= d. + * Returns 0 if it's valid, non-zero if it needs to be cleared. + */ + +static int +process_check_sb_inodes( + xfs_mount_t *mp, + xfs_dinode_core_t *dinoc, + xfs_ino_t lino, + int *type, + int *dirty) +{ + if (lino =3D=3D mp->m_sb.sb_rootino) { + if (*type !=3D XR_INO_DIR) { + do_warn(_("root inode %llu has bad type 0x%x\n"), + lino, dinode_fmt(dinoc)); + *type =3D XR_INO_DIR; if (!no_modify) { - do_warn(_("resetting magic number\n")); + do_warn(_("resetting to directory\n")); + change_dinode_fmt(dinoc, S_IFDIR); *dirty =3D 1; - INT_SET(dinoc->di_magic, ARCH_CONVERT, - XFS_DINODE_MAGIC); - } else { - do_warn(_("would reset magic number\n")); - } - } else if (!uncertain) { - do_warn(_("bad magic number 0x%x on inode %llu\n"), - INT_GET(dinoc->di_magic, ARCH_CONVERT), lino); + } else + do_warn(_("would reset to directory\n")); } + return 0; } - - if (!XFS_DINODE_GOOD_VERSION(dinoc->di_version) || - (!fs_inode_nlink && dinoc->di_version > XFS_DINODE_VERSION_1)) { - retval++; - if (!verify_mode) { - do_warn(_("bad version number 0x%x on inode %llu, "), - dinoc->di_version, lino); + if (lino =3D=3D mp->m_sb.sb_uquotino) { + if (*type !=3D XR_INO_DATA) { + do_warn(_("user quota inode %llu has bad type 0x%x\n"), + lino, dinode_fmt(dinoc)); + mp->m_sb.sb_uquotino =3D NULLFSINO; + return 1; + } + return 0; + } + if (lino =3D=3D mp->m_sb.sb_gquotino) { + if (*type !=3D XR_INO_DATA) { + do_warn(_("group quota inode %llu has bad type 0x%x\n"), + lino, dinode_fmt(dinoc)); + mp->m_sb.sb_gquotino =3D NULLFSINO; + return 1; + } + return 0; + } + if (lino =3D=3D mp->m_sb.sb_rsumino) { + if (*type !=3D XR_INO_RTSUM) { + do_warn(_("realtime summary inode %llu has bad type 0x%x, "), + lino, dinode_fmt(dinoc)); if (!no_modify) { - do_warn(_("resetting version number\n")); + do_warn(_("resetting to regular file\n")); + change_dinode_fmt(dinoc, S_IFREG); *dirty =3D 1; - dinoc->di_version =3D (fs_inode_nlink) ? - XFS_DINODE_VERSION_2 : - XFS_DINODE_VERSION_1; } else { - do_warn(_("would reset version number\n")); + do_warn(_("would reset to regular file\n")); } - } else if (!uncertain) { - do_warn(_("bad version number 0x%x on inode %llu\n"), - dinoc->di_version, lino); } + if (mp->m_sb.sb_rblocks =3D=3D 0 && dinoc->di_nextents !=3D 0) { + do_warn(_("bad # of extents (%u) for realtime summary inode %llu\n"), + be32_to_cpu(dinoc->di_nextents), lino); + return 1; + } + return 0; } - - /* - * blow out of here if the inode size is < 0 - */ - if (INT_GET(dinoc->di_size, ARCH_CONVERT) < 0) { - retval++; - if (!verify_mode) { - do_warn(_("bad (negative) size %lld on inode %llu\n"), - INT_GET(dinoc->di_size, ARCH_CONVERT), lino); + if (lino =3D=3D mp->m_sb.sb_rbmino) { + if (*type !=3D XR_INO_RTBITMAP) { + do_warn(_("realtime bitmap inode %llu has bad type 0x%x, "), + lino, dinode_fmt(dinoc)); if (!no_modify) { - *dirty +=3D clear_dinode(mp, dino, lino); - *cleared =3D 1; - } else { + do_warn(_("resetting to regular file\n")); + change_dinode_fmt(dinoc, S_IFREG); *dirty =3D 1; - *cleared =3D 1; + } else { + do_warn(_("would reset to regular file\n")); } - *used =3D is_free; - } else if (!uncertain) { - do_warn(_("bad (negative) size %lld on inode %llu\n"), - INT_GET(dinoc->di_size, ARCH_CONVERT), lino); } - - return(1); + if (mp->m_sb.sb_rblocks =3D=3D 0 && dinoc->di_nextents !=3D 0) { + do_warn(_("bad # of extents (%u) for realtime bitmap inode %llu\n"), + be32_to_cpu(dinoc->di_nextents), lino); + return 1; + } + return 0; } + return 0; +} =20 - /* - * was_free value is not meaningful if we're in verify mode - */ - if (!verify_mode && INT_GET(dinoc->di_mode, ARCH_CONVERT) =3D=3D 0 && was= _free =3D=3D 1) { - /* - * easy case, inode free -- inode and map agree, clear - * it just in case to ensure that format, etc. are - * set correctly - */ - if (!no_modify) { - err =3D clear_dinode(mp, dino, lino); - if (err) { - *dirty =3D 1; - *cleared =3D 1; - } +/* + * general size/consistency checks: + * + * if the size <=3D size of the data fork, directories must be + * local inodes unlike regular files which would be extent inodes. + * all the other mentioned types have to have a zero size value. + * + * if the size and format don't match, get out now rather than + * risk trying to process a non-existent extents or btree + * type data fork. + */ +static int +process_check_inode_sizes( + xfs_mount_t *mp, + xfs_dinode_t *dino, + xfs_ino_t lino, + int type) +{ + xfs_dinode_core_t *dinoc =3D &dino->di_core; + xfs_fsize_t size =3D be64_to_cpu(dinoc->di_size); + + switch (type) { + + case XR_INO_DIR: + if (size <=3D XFS_DFORK_DSIZE(dino, mp) && + dinoc->di_format !=3D XFS_DINODE_FMT_LOCAL) { + do_warn(_("mismatch between format (%d) and size " + "(%lld) in directory ino %llu\n"), + dinoc->di_format, size, lino); + return 1; } - *used =3D is_free; - return(0); - } else if (!verify_mode && INT_GET(dinoc->di_mode, ARCH_CONVERT) =3D=3D 0= && was_free =3D=3D 0) { + break; + + case XR_INO_SYMLINK: + if (process_symlink_extlist(mp, lino, dino)) { + do_warn(_("bad data fork in symlink %llu\n"), lino); + return 1; + } + break; + + case XR_INO_CHRDEV: /* fall through to FIFO case ... */ + case XR_INO_BLKDEV: /* fall through to FIFO case ... */ + case XR_INO_SOCK: /* fall through to FIFO case ... */ + case XR_INO_MOUNTPOINT: /* fall through to FIFO case ... */ + case XR_INO_FIFO: + if (process_misc_ino_types(mp, dino, lino, type)) + return 1; + break; + + case XR_INO_RTDATA: /* - * the inode looks free but the map says it's in use. - * clear the inode just to be safe and mark the inode - * free. + * if we have no realtime blocks, any inode claiming + * to be a real-time file is bogus */ - do_warn(_("imap claims a free inode %llu is in use, "), lino); - - if (!no_modify) { - do_warn(_("correcting imap and clearing inode\n")); + if (mp->m_sb.sb_rblocks =3D=3D 0) { + do_warn(_("found inode %llu claiming to be a " + "real-time file\n"), lino); + return 1; + } + break; =20 - err =3D clear_dinode(mp, dino, lino); - if (err) { - retval++; - *dirty =3D 1; - *cleared =3D 1; - } - } else { - do_warn(_("would correct imap and clear inode\n")); + case XR_INO_RTBITMAP: + if (size !=3D (__int64_t)mp->m_sb.sb_rbmblocks * + mp->m_sb.sb_blocksize) { + do_warn(_("realtime bitmap inode %llu has bad size " + "%lld (should be %lld)\n"), + lino, size, (__int64_t) mp->m_sb.sb_rbmblocks * + mp->m_sb.sb_blocksize); + return 1; + } + break; =20 - *dirty =3D 1; - *cleared =3D 1; + case XR_INO_RTSUM: + if (size !=3D mp->m_rsumsize) { + do_warn(_("realtime summary inode %llu has bad size " + "%lld (should be %d)\n"), + lino, size, mp->m_rsumsize); + return 1; } + break; =20 - *used =3D is_free; + default: + break; + } + return 0; +} =20 - return(retval > 0 ? 1 : 0); +/* + * check for illegal values of forkoff + */ +static int +process_check_inode_forkoff( + xfs_mount_t *mp, + xfs_dinode_core_t *dinoc, + xfs_ino_t lino) +{ + if (dinoc->di_forkoff =3D=3D 0) + return 0; + + switch (dinoc->di_format) { + case XFS_DINODE_FMT_DEV: + if (dinoc->di_forkoff !=3D (roundup(sizeof(xfs_dev_t), 8) >> 3)) { + do_warn(_("bad attr fork offset %d in dev inode %llu, " + "should be %d\n"), dinoc->di_forkoff, lino, + (int)(roundup(sizeof(xfs_dev_t), 8) >> 3)); + return 1; + } + break; + case XFS_DINODE_FMT_LOCAL: /* fall through ... */ + case XFS_DINODE_FMT_EXTENTS: /* fall through ... */ + case XFS_DINODE_FMT_BTREE: + if (dinoc->di_forkoff >=3D (XFS_LITINO(mp) >> 3)) { + do_warn(_("bad attr fork offset %d in inode %llu, " + "max=3D%d\n"), dinoc->di_forkoff, lino, + XFS_LITINO(mp) >> 3); + return 1; + } + break; + default: + do_error(_("unexpected inode format %d\n"), dinoc->di_format); + break; } + return 0; +} =20 - /* - * because of the lack of any write ordering guarantee, it's - * possible that the core got updated but the forks didn't. - * so rather than be ambitious (and probably incorrect), - * if there's an inconsistency, we get conservative and - * just pitch the file. blow off checking formats of - * free inodes since technically any format is legal - * as we reset the inode when we re-use it. - */ - if (INT_GET(dinoc->di_mode, ARCH_CONVERT) !=3D 0 && - ((((INT_GET(dinoc->di_mode, ARCH_CONVERT) & S_IFMT) >> 12) > 15) || - (uchar_t) dinoc->di_format > XFS_DINODE_FMT_UUID || - (!(okfmts[(INT_GET(dinoc->di_mode, ARCH_CONVERT) & S_IFMT) >> 12] & - (1 << dinoc->di_format))))) { - /* bad inode format */ - retval++; - if (!uncertain) - do_warn(_("bad inode format in inode %llu\n"), lino); - if (!verify_mode) { - if (!no_modify) { - *dirty +=3D clear_dinode(mp, dino, lino); - ASSERT(*dirty > 0); - } +/* + * Updates the inodes block and extent counts if they are wrong + */ +static int +process_inode_blocks_and_extents( + xfs_dinode_core_t *dinoc, + xfs_drfsbno_t nblocks, + __uint64_t nextents, + __uint64_t anextents, + xfs_ino_t lino, + int *dirty) +{ + if (nblocks !=3D be64_to_cpu(dinoc->di_nblocks)) { + if (!no_modify) { + do_warn(_("correcting nblocks for inode %llu, " + "was %llu - counted %llu\n"), lino, + be64_to_cpu(dinoc->di_nblocks), nblocks); + dinoc->di_nblocks =3D cpu_to_be64(nblocks); + *dirty =3D 1; + } else { + do_warn(_("bad nblocks %llu for inode %llu, " + "would reset to %llu\n"), + be64_to_cpu(dinoc->di_nblocks), lino, nblocks); } - *cleared =3D 1; - *used =3D is_free; + } =20 - return(retval > 0 ? 1 : 0); + if (nextents > MAXEXTNUM) { + do_warn(_("too many data fork extents (%llu) in inode %llu\n"), + nextents, lino); + return 1; + } + if (nextents !=3D be32_to_cpu(dinoc->di_nextents)) { + if (!no_modify) { + do_warn(_("correcting nextents for inode %llu, " + "was %d - counted %llu\n"), lino, + be32_to_cpu(dinoc->di_nextents), nextents); + dinoc->di_nextents =3D cpu_to_be32(nextents); + *dirty =3D 1; + } else { + do_warn(_("bad nextents %d for inode %llu, would reset " + "to %llu\n"), be32_to_cpu(dinoc->di_nextents), + lino, nextents); + } } =20 - if (verify_mode) - return(retval > 0 ? 1 : 0); + if (anextents > MAXAEXTNUM) { + do_warn(_("too many attr fork extents (%llu) in inode %llu\n"), + anextents, lino); + return 1; + } + if (anextents !=3D be16_to_cpu(dinoc->di_anextents)) { + if (!no_modify) { + do_warn(_("correcting anextents for inode %llu, " + "was %d - counted %llu\n"), lino, + be16_to_cpu(dinoc->di_anextents), anextents); + dinoc->di_anextents =3D cpu_to_be16(anextents); + *dirty =3D 1; + } else { + do_warn(_("bad anextents %d for inode %llu, would reset" + " to %llu\n"), be16_to_cpu(dinoc->di_anextents), + lino, anextents); + } + } + return 0; +} =20 - /* - * clear the next unlinked field if necessary on a good - * inode only during phase 4 -- when checking for inodes - * referencing duplicate blocks. then it's safe because - * we've done the inode discovery and have found all the inodes - * we're going to find. check_dups is set to 1 only during - * phase 4. Ugly. - */ - if (check_dups && !no_modify) - *dirty +=3D clear_dinode_unlinked(mp, dino); +/* + * check data fork -- if it's bad, clear the inode + */ +static int +process_inode_data_fork( + xfs_mount_t *mp, + xfs_agnumber_t agno, + xfs_agino_t ino, + xfs_dinode_t *dino, + int type, + int *dirty, + xfs_drfsbno_t *totblocks, + __uint64_t *nextents, + blkmap_t **dblkmap, + int check_dups) +{ + xfs_dinode_core_t *dinoc =3D &dino->di_core; + xfs_ino_t lino =3D XFS_AGINO_TO_INO(mp, agno, ino); + int err =3D 0; =20 - /* set type and map type info */ + *nextents =3D be32_to_cpu(dinoc->di_nextents); + if (*nextents > be64_to_cpu(dinoc->di_nblocks) || + *nextents > XFS_MAX_INCORE_EXTENTS) + *nextents =3D 1; + + if (dinoc->di_format !=3D XFS_DINODE_FMT_LOCAL && type !=3D XR_INO_RTDATA) + *dblkmap =3D blkmap_alloc(*nextents); + *nextents =3D 0; =20 - switch (INT_GET(dinoc->di_mode, ARCH_CONVERT) & S_IFMT) { - case S_IFDIR: - type =3D XR_INO_DIR; - *isa_dir =3D 1; - break; - case S_IFREG: - if (INT_GET(dinoc->di_flags, ARCH_CONVERT) & XFS_DIFLAG_REALTIME) - type =3D XR_INO_RTDATA; - else if (lino =3D=3D mp->m_sb.sb_rbmino) - type =3D XR_INO_RTBITMAP; - else if (lino =3D=3D mp->m_sb.sb_rsumino) - type =3D XR_INO_RTSUM; - else - type =3D XR_INO_DATA; - break; - case S_IFLNK: - type =3D XR_INO_SYMLINK; - break; - case S_IFCHR: - type =3D XR_INO_CHRDEV; - break; - case S_IFBLK: - type =3D XR_INO_BLKDEV; - break; - case S_IFSOCK: - type =3D XR_INO_SOCK; - break; - case S_IFIFO: - type =3D XR_INO_FIFO; - break; - default: - retval++; - if (!verify_mode) { - do_warn(_("bad inode type %#o inode %llu\n"), - (int) (INT_GET(dinoc->di_mode, ARCH_CONVERT) & S_IFMT), lino); - if (!no_modify) - *dirty +=3D clear_dinode(mp, dino, lino); - else - *dirty =3D 1; - *cleared =3D 1; - *used =3D is_free; - } else if (!uncertain) { - do_warn(_("bad inode type %#o inode %llu\n"), - (int) (INT_GET(dinoc->di_mode, ARCH_CONVERT) & S_IFMT), lino); - } - return 1; - } - - /* - * type checks for root, realtime inodes, and quota inodes - */ - if (lino =3D=3D mp->m_sb.sb_rootino && type !=3D XR_INO_DIR) { - do_warn(_("bad inode type for root inode %llu, "), lino); - type =3D XR_INO_DIR; - - if (!no_modify) { - do_warn(_("resetting to directory\n")); - INT_MOD_EXPR(dinoc->di_mode, ARCH_CONVERT, - &=3D ~(INT_GET(dinoc->di_mode, ARCH_CONVERT) & S_IFMT)); - INT_MOD_EXPR(dinoc->di_mode, ARCH_CONVERT, - |=3D INT_GET(dinoc->di_mode, ARCH_CONVERT) & S_IFDIR); - } else { - do_warn(_("would reset to directory\n")); - } - } else if (lino =3D=3D mp->m_sb.sb_rsumino) { - do_rt =3D 1; - rstring =3D _("summary"); - rtype =3D XR_INO_RTSUM; - } else if (lino =3D=3D mp->m_sb.sb_rbmino) { - do_rt =3D 1; - rstring =3D _("bitmap"); - rtype =3D XR_INO_RTBITMAP; - } else if (lino =3D=3D mp->m_sb.sb_uquotino) { - if (type !=3D XR_INO_DATA) { - do_warn(_("user quota inode has bad type 0x%x\n"), - INT_GET(dinoc->di_mode, ARCH_CONVERT) & S_IFMT); - - if (!no_modify) { - *dirty +=3D clear_dinode(mp, dino, lino); - ASSERT(*dirty > 0); - } - - *cleared =3D 1; - *used =3D is_free; - *isa_dir =3D 0; - - mp->m_sb.sb_uquotino =3D NULLFSINO; - - return(1); - } - } else if (lino =3D=3D mp->m_sb.sb_gquotino) { - if (type !=3D XR_INO_DATA) { - do_warn(_("group quota inode has bad type 0x%x\n"), - INT_GET(dinoc->di_mode, ARCH_CONVERT) & S_IFMT); - - if (!no_modify) { - *dirty +=3D clear_dinode(mp, dino, lino); - ASSERT(*dirty > 0); - } - - *cleared =3D 1; - *used =3D is_free; - *isa_dir =3D 0; - - mp->m_sb.sb_gquotino =3D NULLFSINO; - - return(1); - } - } - - if (do_rt && type !=3D rtype) { - type =3D XR_INO_DATA; - - do_warn(_("bad inode type for realtime %s inode %llu, "), - rstring, lino); - - if (!no_modify) { - do_warn(_("resetting to regular file\n")); - INT_MOD_EXPR(dinoc->di_mode, ARCH_CONVERT, - &=3D ~(INT_GET(dinoc->di_mode, ARCH_CONVERT) & S_IFMT)); - INT_MOD_EXPR(dinoc->di_mode, ARCH_CONVERT, - |=3D INT_GET(dinoc->di_mode, ARCH_CONVERT) & S_IFREG); - } else { - do_warn(_("would reset to regular file\n")); - } - } - - /* - * only regular files with REALTIME or EXTSIZE flags set can have - * extsize set, or directories with EXTSZINHERIT. - */ - if (INT_GET(dinoc->di_extsize, ARCH_CONVERT) !=3D 0) { - if ((type =3D=3D XR_INO_RTDATA) || - (type =3D=3D XR_INO_DIR && - (INT_GET(dinoc->di_flags, ARCH_CONVERT) & - XFS_DIFLAG_EXTSZINHERIT)) || - (type =3D=3D XR_INO_DATA && - (INT_GET(dinoc->di_flags, ARCH_CONVERT) & - XFS_DIFLAG_EXTSIZE))) { - /* s'okay */ ; - } else { - do_warn( - _("bad non-zero extent size %u for non-realtime/extsize inode %llu, "), - INT_GET(dinoc->di_extsize, ARCH_CONVERT), lino); - - if (!no_modify) { - do_warn(_("resetting to zero\n")); - dinoc->di_extsize =3D 0; - *dirty =3D 1; - } else { - do_warn(_("would reset to zero\n")); - } - } - } - - /* - * for realtime inodes, check sizes to see that - * they are consistent with the # of realtime blocks. - * also, verify that they contain only one extent and - * are extent format files. If anything's wrong, clear - * the inode -- we'll recreate it in phase 6. - */ - if (do_rt && - ((lino =3D=3D mp->m_sb.sb_rbmino && - INT_GET(dinoc->di_size, ARCH_CONVERT) - !=3D mp->m_sb.sb_rbmblocks * mp->m_sb.sb_blocksize) || - (lino =3D=3D mp->m_sb.sb_rsumino && - INT_GET(dinoc->di_size, ARCH_CONVERT) !=3D mp->m_rsumsize))) { - - do_warn(_("bad size %llu for realtime %s inode %llu\n"), - INT_GET(dinoc->di_size, ARCH_CONVERT), rstring, lino); - - if (!no_modify) { - *dirty +=3D clear_dinode(mp, dino, lino); - ASSERT(*dirty > 0); - } - - *cleared =3D 1; - *used =3D is_free; - *isa_dir =3D 0; - - return(1); - } - - if (do_rt && mp->m_sb.sb_rblocks =3D=3D 0 && INT_GET(dinoc->di_nextents, = ARCH_CONVERT) !=3D 0) { - do_warn(_("bad # of extents (%u) for realtime %s inode %llu\n"), - INT_GET(dinoc->di_nextents, ARCH_CONVERT), rstring, lino); - - if (!no_modify) { - *dirty +=3D clear_dinode(mp, dino, lino); - ASSERT(*dirty > 0); - } - - *cleared =3D 1; - *used =3D is_free; - *isa_dir =3D 0; - - return(1); - } - - /* - * Setup nextents and anextents for blkmap_alloc calls. - */ - nextents =3D INT_GET(dinoc->di_nextents, ARCH_CONVERT); - if (nextents > INT_GET(dinoc->di_nblocks, ARCH_CONVERT) || nextents > XFS= _MAX_INCORE_EXTENTS) - nextents =3D 1; - anextents =3D INT_GET(dinoc->di_anextents, ARCH_CONVERT); - if (anextents > INT_GET(dinoc->di_nblocks, ARCH_CONVERT) || anextents > X= FS_MAX_INCORE_EXTENTS) - anextents =3D 1; - - /* - * general size/consistency checks: - * - * if the size <=3D size of the data fork, directories must be - * local inodes unlike regular files which would be extent inodes. - * all the other mentioned types have to have a zero size value. - * - * if the size and format don't match, get out now rather than - * risk trying to process a non-existent extents or btree - * type data fork. - */ - switch (type) { - case XR_INO_DIR: - if (INT_GET(dinoc->di_size, ARCH_CONVERT) <=3D - XFS_DFORK_DSIZE(dino, mp) && - (dinoc->di_format !=3D XFS_DINODE_FMT_LOCAL)) { - do_warn( -_("mismatch between format (%d) and size (%lld) in directory ino %llu\n"), - dinoc->di_format, - INT_GET(dinoc->di_size, ARCH_CONVERT), - lino); - - if (!no_modify) { - *dirty +=3D clear_dinode(mp, - dino, lino); - ASSERT(*dirty > 0); - } - - *cleared =3D 1; - *used =3D is_free; - *isa_dir =3D 0; - - return(1); - } - if (dinoc->di_format !=3D XFS_DINODE_FMT_LOCAL) - dblkmap =3D blkmap_alloc(nextents); - break; - case XR_INO_SYMLINK: - if (process_symlink_extlist(mp, lino, dino)) { - do_warn(_("bad data fork in symlink %llu\n"), lino); - - if (!no_modify) { - *dirty +=3D clear_dinode(mp, - dino, lino); - ASSERT(*dirty > 0); - } - - *cleared =3D 1; - *used =3D is_free; - *isa_dir =3D 0; - - return(1); - } - if (dinoc->di_format !=3D XFS_DINODE_FMT_LOCAL) - dblkmap =3D blkmap_alloc(nextents); - break; - case XR_INO_CHRDEV: /* fall through to FIFO case ... */ - case XR_INO_BLKDEV: /* fall through to FIFO case ... */ - case XR_INO_SOCK: /* fall through to FIFO case ... */ - case XR_INO_MOUNTPOINT: /* fall through to FIFO case ... */ - case XR_INO_FIFO: - if (process_misc_ino_types(mp, dino, lino, type)) { - if (!no_modify) { - *dirty +=3D clear_dinode(mp, dino, lino); - ASSERT(*dirty > 0); - } - - *cleared =3D 1; - *used =3D is_free; - *isa_dir =3D 0; - - return(1); - } - break; - case XR_INO_RTDATA: - /* - * if we have no realtime blocks, any inode claiming - * to be a real-time file is bogus - */ - if (mp->m_sb.sb_rblocks =3D=3D 0) { - do_warn( - _("found inode %llu claiming to be a real-time file\n"), - lino); - - if (!no_modify) { - *dirty +=3D clear_dinode(mp, dino, lino); - ASSERT(*dirty > 0); - } - - *cleared =3D 1; - *used =3D is_free; - *isa_dir =3D 0; - - return(1); - } - break; - case XR_INO_RTBITMAP: - if (INT_GET(dinoc->di_size, ARCH_CONVERT) !=3D - (__int64_t)mp->m_sb.sb_rbmblocks * mp->m_sb.sb_blocksize) { - do_warn( - _("realtime bitmap inode %llu has bad size %lld (should be %lld)\n"), - lino, INT_GET(dinoc->di_size, ARCH_CONVERT), - (__int64_t) mp->m_sb.sb_rbmblocks * - mp->m_sb.sb_blocksize); - - if (!no_modify) { - *dirty +=3D clear_dinode(mp, dino, lino); - ASSERT(*dirty > 0); - } - - *cleared =3D 1; - *used =3D is_free; - *isa_dir =3D 0; - - return(1); - } - dblkmap =3D blkmap_alloc(nextents); - break; - case XR_INO_RTSUM: - if (INT_GET(dinoc->di_size, ARCH_CONVERT) !=3D mp->m_rsumsize) { - do_warn( - _("realtime summary inode %llu has bad size %lld (should be %d)\n"), - lino, INT_GET(dinoc->di_size, ARCH_CONVERT), - mp->m_rsumsize); - - if (!no_modify) { - *dirty +=3D clear_dinode(mp, dino, lino); - ASSERT(*dirty > 0); - } - - *cleared =3D 1; - *used =3D is_free; - *isa_dir =3D 0; - - return(1); - } - dblkmap =3D blkmap_alloc(nextents); - break; - default: - break; - } - - /* - * check for illegal values of forkoff - */ - err =3D 0; - if (dinoc->di_forkoff !=3D 0) { - switch (dinoc->di_format) { - case XFS_DINODE_FMT_DEV: - if (dinoc->di_forkoff !=3D - (roundup(sizeof(xfs_dev_t), 8) >> 3)) { - do_warn( - _("bad attr fork offset %d in dev inode %llu, should be %d\n"), - (int) dinoc->di_forkoff, - lino, - (int) (roundup(sizeof(xfs_dev_t), 8) >> 3)); - err =3D 1; - } - break; - case XFS_DINODE_FMT_UUID: - if (dinoc->di_forkoff !=3D - (roundup(sizeof(uuid_t), 8) >> 3)) { - do_warn( - _("bad attr fork offset %d in uuid inode %llu, should be %d\n"), - (int) dinoc->di_forkoff, - lino, - (int)(roundup(sizeof(uuid_t), 8) >> 3)); - err =3D 1; - } - break; - case XFS_DINODE_FMT_LOCAL: /* fall through ... */ - case XFS_DINODE_FMT_EXTENTS: /* fall through ... */ - case XFS_DINODE_FMT_BTREE: { - if (dinoc->di_forkoff >=3D (XFS_LITINO(mp) >> 3)) { - do_warn( - _("bad attr fork offset %d in inode %llu, max=3D%d\n"), - (int) dinoc->di_forkoff, - lino, XFS_LITINO(mp) >> 3); - err =3D 1; - } - break; - } - default: - do_error(_("unexpected inode format %d\n"), - (int) dinoc->di_format); - break; - } - } - - if (err) { - if (!no_modify) { - *dirty +=3D clear_dinode(mp, dino, lino); - ASSERT(*dirty > 0); - } - - *cleared =3D 1; - *used =3D is_free; - *isa_dir =3D 0; - blkmap_free(dblkmap); - return(1); - } - - /* - * check data fork -- if it's bad, clear the inode - */ - nextents =3D 0; switch (dinoc->di_format) { case XFS_DINODE_FMT_LOCAL: - err =3D process_lclinode(mp, agno, ino, dino, type, - dirty, &totblocks, &nextents, &dblkmap, - XFS_DATA_FORK, check_dups); + err =3D process_lclinode(mp, agno, ino, dino, type, dirty, + totblocks, nextents, dblkmap, XFS_DATA_FORK, + check_dups); break; case XFS_DINODE_FMT_EXTENTS: - err =3D process_exinode(mp, agno, ino, dino, type, - dirty, &totblocks, &nextents, &dblkmap, - XFS_DATA_FORK, check_dups); + err =3D process_exinode(mp, agno, ino, dino, type, dirty, + totblocks, nextents, dblkmap, XFS_DATA_FORK, + check_dups); break; case XFS_DINODE_FMT_BTREE: - err =3D process_btinode(mp, agno, ino, dino, type, - dirty, &totblocks, &nextents, &dblkmap, - XFS_DATA_FORK, check_dups); + err =3D process_btinode(mp, agno, ino, dino, type, dirty, + totblocks, nextents, dblkmap, XFS_DATA_FORK, + check_dups); break; case XFS_DINODE_FMT_DEV: /* fall through */ - case XFS_DINODE_FMT_UUID: err =3D 0; break; default: do_error(_("unknown format %d, ino %llu (mode =3D %d)\n"), - dinoc->di_format, lino, - INT_GET(dinoc->di_mode, ARCH_CONVERT)); + dinoc->di_format, lino, be16_to_cpu(dinoc->di_mode)); } =20 if (err) { - /* - * problem in the data fork, clear out the inode - * and get out - */ do_warn(_("bad data fork in inode %llu\n"), lino); - if (!no_modify) { *dirty +=3D clear_dinode(mp, dino, lino); ASSERT(*dirty > 0); } - - *cleared =3D 1; - *used =3D is_free; - *isa_dir =3D 0; - blkmap_free(dblkmap); - return(1); + return 1; } =20 if (check_dups) { @@ -2486,465 +2166,633 @@ _("mismatch between format (%d) and size switch (dinoc->di_format) { case XFS_DINODE_FMT_LOCAL: err =3D process_lclinode(mp, agno, ino, dino, type, - dirty, &totblocks, &nextents, &dblkmap, + dirty, totblocks, nextents, dblkmap, XFS_DATA_FORK, 0); break; case XFS_DINODE_FMT_EXTENTS: err =3D process_exinode(mp, agno, ino, dino, type, - dirty, &totblocks, &nextents, &dblkmap, + dirty, totblocks, nextents, dblkmap, XFS_DATA_FORK, 0); break; case XFS_DINODE_FMT_BTREE: err =3D process_btinode(mp, agno, ino, dino, type, - dirty, &totblocks, &nextents, &dblkmap, + dirty, totblocks, nextents, dblkmap, XFS_DATA_FORK, 0); break; case XFS_DINODE_FMT_DEV: /* fall through */ - case XFS_DINODE_FMT_UUID: err =3D 0; break; default: do_error(_("unknown format %d, ino %llu (mode =3D %d)\n"), dinoc->di_format, lino, - INT_GET(dinoc->di_mode, ARCH_CONVERT)); + be16_to_cpu(dinoc->di_mode)); } =20 - if (no_modify && err !=3D 0) { - *cleared =3D 1; - *used =3D is_free; - *isa_dir =3D 0; - blkmap_free(dblkmap); - return(1); - } + if (no_modify && err !=3D 0) + return 1; =20 ASSERT(err =3D=3D 0); } + return 0; +} =20 - /* - * check attribute fork if necessary. attributes are - * always stored in the regular filesystem. - */ +/* + * Process extended attribute fork in inode + */ +static int +process_inode_attr_fork( + xfs_mount_t *mp, + xfs_agnumber_t agno, + xfs_agino_t ino, + xfs_dinode_t *dino, + int type, + int *dirty, + xfs_drfsbno_t *atotblocks, + __uint64_t *anextents, + int check_dups, + int extra_attr_check, + int *retval) +{ + xfs_dinode_core_t *dinoc =3D &dino->di_core; + xfs_ino_t lino =3D XFS_AGINO_TO_INO(mp, agno, ino); + blkmap_t *ablkmap =3D NULL; + int repair =3D 0; + int err; + + if (!XFS_DFORK_Q(dino)) { + *anextents =3D 0; + if (dinoc->di_aformat !=3D XFS_DINODE_FMT_EXTENTS) { + do_warn(_("bad attribute format %d in inode %llu, "), + dinoc->di_aformat, lino); + if (!no_modify) { + do_warn(_("resetting value\n")); + dinoc->di_aformat =3D XFS_DINODE_FMT_EXTENTS; + *dirty =3D 1; + } else + do_warn(_("would reset value\n")); + } + return 0; + } =20 - if (!XFS_DFORK_Q(dino) && - dinoc->di_aformat !=3D XFS_DINODE_FMT_EXTENTS) { - do_warn(_("bad attribute format %d in inode %llu, "), - dinoc->di_aformat, lino); - if (!no_modify) { - do_warn(_("resetting value\n")); - dinoc->di_aformat =3D XFS_DINODE_FMT_EXTENTS; - *dirty =3D 1; - } else - do_warn(_("would reset value\n")); - anextents =3D 0; - } else if (XFS_DFORK_Q(dino)) { + *anextents =3D be32_to_cpu(dinoc->di_anextents); + if (*anextents > be64_to_cpu(dinoc->di_nblocks) || + *anextents > XFS_MAX_INCORE_EXTENTS) + *anextents =3D 1; + + switch (dinoc->di_aformat) { + case XFS_DINODE_FMT_LOCAL: + *anextents =3D 0; + err =3D process_lclinode(mp, agno, ino, dino, type, dirty, + atotblocks, anextents, &ablkmap, + XFS_ATTR_FORK, check_dups); + break; + case XFS_DINODE_FMT_EXTENTS: + ablkmap =3D blkmap_alloc(*anextents); + *anextents =3D 0; + err =3D process_exinode(mp, agno, ino, dino, type, dirty, + atotblocks, anextents, &ablkmap, + XFS_ATTR_FORK, check_dups); + break; + case XFS_DINODE_FMT_BTREE: + ablkmap =3D blkmap_alloc(*anextents); + *anextents =3D 0; + err =3D process_btinode(mp, agno, ino, dino, type, dirty, + atotblocks, anextents, &ablkmap, + XFS_ATTR_FORK, check_dups); + break; + default: + do_warn(_("illegal attribute format %d, ino %llu\n"), + dinoc->di_aformat, lino); + err =3D 1; + break; + } + + if (err) { + /* + * clear the attribute fork if necessary. we can't + * clear the inode because we've already put the + * inode space info into the blockmap. + * + * XXX - put the inode onto the "move it" list and + * log the the attribute scrubbing + */ + do_warn(_("bad attribute fork in inode %llu"), lino); + + if (!no_modify) { + if (delete_attr_ok) { + do_warn(_(", clearing attr fork\n")); + *dirty +=3D clear_dinode_attr(mp, dino, lino); + dinoc->di_aformat =3D XFS_DINODE_FMT_LOCAL; + } else { + do_warn("\n"); + *dirty +=3D clear_dinode(mp, dino, lino); + } + ASSERT(*dirty > 0); + } else { + do_warn(_(", would clear attr fork\n")); + } + + *atotblocks =3D 0; + *anextents =3D 0; + blkmap_free(ablkmap); + *retval =3D 1; + + return delete_attr_ok ? 0 : 1; + } + + if (check_dups) { switch (dinoc->di_aformat) { case XFS_DINODE_FMT_LOCAL: - anextents =3D 0; err =3D process_lclinode(mp, agno, ino, dino, - type, dirty, &atotblocks, &anextents, &ablkmap, - XFS_ATTR_FORK, check_dups); + type, dirty, atotblocks, anextents, + &ablkmap, XFS_ATTR_FORK, 0); break; case XFS_DINODE_FMT_EXTENTS: - ablkmap =3D blkmap_alloc(anextents); - anextents =3D 0; err =3D process_exinode(mp, agno, ino, dino, - type, dirty, &atotblocks, &anextents, &ablkmap, - XFS_ATTR_FORK, check_dups); + type, dirty, atotblocks, anextents, + &ablkmap, XFS_ATTR_FORK, 0); break; case XFS_DINODE_FMT_BTREE: - ablkmap =3D blkmap_alloc(anextents); - anextents =3D 0; err =3D process_btinode(mp, agno, ino, dino, - type, dirty, &atotblocks, &anextents, &ablkmap, - XFS_ATTR_FORK, check_dups); + type, dirty, atotblocks, anextents, + &ablkmap, XFS_ATTR_FORK, 0); break; default: - anextents =3D 0; - do_warn(_("illegal attribute format %d, ino %llu\n"), - dinoc->di_aformat, lino); - err =3D 1; - break; + do_error(_("illegal attribute fmt %d, ino %llu\n"), + dinoc->di_aformat, lino); } =20 - if (err) { - /* - * clear the attribute fork if necessary. we can't - * clear the inode because we've already put the - * inode space info into the blockmap. - * - * XXX - put the inode onto the "move it" list and - * log the the attribute scrubbing - */ - do_warn(_("bad attribute fork in inode %llu"), lino); + if (no_modify && err !=3D 0) { + blkmap_free(ablkmap); + return 1; + } + + ASSERT(err =3D=3D 0); + } + + /* + * do attribute semantic-based consistency checks now + */ =20 + /* get this only in phase 3, not in both phase 3 and 4 */ + if (extra_attr_check && + process_attributes(mp, lino, dino, ablkmap, &repair)) { + do_warn(_("problem with attribute contents in inode %llu\n"), + lino); + if (!repair) { + /* clear attributes if not done already */ if (!no_modify) { - if (delete_attr_ok) { - do_warn(_(", clearing attr fork\n")); - *dirty +=3D clear_dinode_attr(mp, - dino, lino); - } else { - do_warn("\n"); - *dirty +=3D clear_dinode(mp, - dino, lino); - } - ASSERT(*dirty > 0); + *dirty +=3D clear_dinode_attr(mp, dino, lino); + dinoc->di_aformat =3D XFS_DINODE_FMT_LOCAL; } else { - do_warn(_(", would clear attr fork\n")); + do_warn(_("would clear attr fork\n")); } + *atotblocks =3D 0; + *anextents =3D 0; + } + else { + *dirty =3D 1; /* it's been repaired */ + } + } + blkmap_free(ablkmap); + return 0; +} =20 - atotblocks =3D 0; - anextents =3D 0; +/* + * check nlinks feature, if it's a version 1 inode, + * just leave nlinks alone. even if it's set wrong, + * it'll be reset when read in. + */ =20 - if (delete_attr_ok) { - if (!no_modify) - dinoc->di_aformat =3D XFS_DINODE_FMT_LOCAL; +static int +process_check_inode_nlink_version( + xfs_dinode_core_t *dinoc, + xfs_ino_t lino) +{ + int dirty =3D 0; + + if (dinoc->di_version > XFS_DINODE_VERSION_1 && !fs_inode_nlink) { + /* + * do we have a fs/inode version mismatch with a valid + * version 2 inode here that has to stay version 2 or + * lose links? + */ + if (be32_to_cpu(dinoc->di_nlink) > XFS_MAXLINK_1) { + /* + * yes. are nlink inodes allowed? + */ + if (fs_inode_nlink_allowed) { + /* + * yes, update status variable which will + * cause sb to be updated later. + */ + fs_inode_nlink =3D 1; + do_warn(_("version 2 inode %llu claims > %u links, "), + lino, XFS_MAXLINK_1); + if (!no_modify) { + do_warn(_("updating superblock " + "version number\n")); + } else { + do_warn(_("would update superblock " + "version number\n")); + } } else { - *cleared =3D 1; - *used =3D is_free; - *isa_dir =3D 0; - blkmap_free(dblkmap); - blkmap_free(ablkmap); + /* + * no, have to convert back to onlinks + * even if we lose some links + */ + do_warn(_("WARNING: version 2 inode %llu " + "claims > %u links, "), + lino, XFS_MAXLINK_1); + if (!no_modify) { + do_warn(_("converting back to version 1,\n" + "this may destroy %d links\n"), + be32_to_cpu(dinoc->di_nlink) - + XFS_MAXLINK_1); + + dinoc->di_version =3D XFS_DINODE_VERSION_1; + dinoc->di_nlink =3D cpu_to_be32(XFS_MAXLINK_1); + dinoc->di_onlink =3D cpu_to_be16(XFS_MAXLINK_1); + dirty =3D 1; + } else { + do_warn(_("would convert back to version 1,\n" + "\tthis might destroy %d links\n"), + be32_to_cpu(dinoc->di_nlink) - + XFS_MAXLINK_1); + } } - return(1); + } else { + /* + * do we have a v2 inode that we could convert back + * to v1 without losing any links? if we do and + * we have a mismatch between superblock bits and the + * version bit, alter the version bit in this case. + * + * the case where we lost links was handled above. + */ + do_warn(_("found version 2 inode %llu, "), lino); + if (!no_modify) { + do_warn(_("converting back to version 1\n")); + dinoc->di_version =3D XFS_DINODE_VERSION_1; + dinoc->di_onlink =3D cpu_to_be16( + be32_to_cpu(dinoc->di_nlink)); + dirty =3D 1; + } else { + do_warn(_("would convert back to version 1\n")); + } + } + } =20 - } else if (check_dups) { - switch (dinoc->di_aformat) { - case XFS_DINODE_FMT_LOCAL: - err =3D process_lclinode(mp, agno, ino, dino, - type, dirty, &atotblocks, &anextents, - &ablkmap, XFS_ATTR_FORK, 0); - break; - case XFS_DINODE_FMT_EXTENTS: - err =3D process_exinode(mp, agno, ino, dino, - type, dirty, &atotblocks, &anextents, - &ablkmap, XFS_ATTR_FORK, 0); - break; - case XFS_DINODE_FMT_BTREE: - err =3D process_btinode(mp, agno, ino, dino, - type, dirty, &atotblocks, &anextents, - &ablkmap, XFS_ATTR_FORK, 0); - break; - default: - do_error( - _("illegal attribute fmt %d, ino %llu\n"), - dinoc->di_aformat, lino); + /* + * ok, if it's still a version 2 inode, it's going + * to stay a version 2 inode. it should have a zero + * onlink field, so clear it. + */ + if (dinoc->di_version > XFS_DINODE_VERSION_1 && + dinoc->di_onlink !=3D 0 && fs_inode_nlink > 0) { + if (!no_modify) { + do_warn(_("clearing obsolete nlink field in " + "version 2 inode %llu, was %d, now 0\n"), + lino, be16_to_cpu(dinoc->di_onlink)); + dinoc->di_onlink =3D 0; + dirty =3D 1; + } else { + do_warn(_("would clear obsolete nlink field in " + "version 2 inode %llu, currently %d\n"), + lino, be16_to_cpu(dinoc->di_onlink)); + } + } + return dirty; +} + +/* + * returns 0 if the inode is ok, 1 if the inode is corrupt + * check_dups can be set to 1 *only* when called by the + * first pass of the duplicate block checking of phase 4. + * *dirty is set > 0 if the dinode has been altered and + * needs to be written out. + * + * for detailed, info, look at process_dinode() comments. + */ +/* ARGSUSED */ +int +process_dinode_int(xfs_mount_t *mp, + xfs_dinode_t *dino, + xfs_agnumber_t agno, + xfs_agino_t ino, + int was_free, /* 1 if inode is currently free */ + int *dirty, /* out =3D=3D > 0 if inode is now dirty */ + int *used, /* out =3D=3D 1 if inode is in use */ + int verify_mode, /* 1 =3D=3D verify but don't modify inode */ + int uncertain, /* 1 =3D=3D inode is uncertain */ + int ino_discovery, /* 1 =3D=3D check dirs for unknown inodes */ + int check_dups, /* 1 =3D=3D check if inode claims + * duplicate blocks */ + int extra_attr_check, /* 1 =3D=3D do attribute format and value checks */ + int *isa_dir, /* out =3D=3D 1 if inode is a directory */ + xfs_ino_t *parent) /* out -- parent if ino is a dir */ +{ + xfs_drfsbno_t totblocks =3D 0; + xfs_drfsbno_t atotblocks =3D 0; + xfs_dinode_core_t *dinoc; + int di_mode; + int type; + int retval =3D 0; + __uint64_t nextents; + __uint64_t anextents; + xfs_ino_t lino; + const int is_free =3D 0; + const int is_used =3D 1; + blkmap_t *dblkmap =3D NULL; + + *dirty =3D *isa_dir =3D 0; + *used =3D is_used; + type =3D XR_INO_UNKNOWN; + + dinoc =3D &dino->di_core; + lino =3D XFS_AGINO_TO_INO(mp, agno, ino); + di_mode =3D be16_to_cpu(dinoc->di_mode); + + /* + * if in verify mode, don't modify the inode. + * + * if correcting, reset stuff that has known values + * + * if in uncertain mode, be silent on errors since we're + * trying to find out if these are inodes as opposed + * to assuming that they are. Just return the appropriate + * return code in that case. + * + * If uncertain is set, verify_mode MUST be set. + */ + ASSERT(uncertain =3D=3D 0 || verify_mode !=3D 0); + + if (be16_to_cpu(dinoc->di_magic) !=3D XFS_DINODE_MAGIC) { + retval =3D 1; + if (!uncertain) + do_warn(_("bad magic number 0x%x on inode %llu\n"), + be16_to_cpu(dinoc->di_magic), lino); + if (!verify_mode) { + if (!no_modify) { + do_warn(_("resetting magic number\n")); + dinoc->di_magic =3D cpu_to_be16(XFS_DINODE_MAGIC); + *dirty =3D 1; + } else + do_warn(_("would reset magic number\n")); + } + } + + if (!XFS_DINODE_GOOD_VERSION(dinoc->di_version) || + (!fs_inode_nlink && dinoc->di_version > XFS_DINODE_VERSION_1)) { + retval =3D 1; + if (!uncertain) + do_warn(_("bad version number 0x%x on inode %llu, "), + dinoc->di_version, lino); + if (!verify_mode) { + if (!no_modify) { + do_warn(_("resetting version number\n")); + dinoc->di_version =3D (fs_inode_nlink) ? + XFS_DINODE_VERSION_2 : + XFS_DINODE_VERSION_1; + *dirty =3D 1; + } else + do_warn(_("would reset version number\n")); + } + } + + /* + * blow out of here if the inode size is < 0 + */ + if ((xfs_fsize_t)be64_to_cpu(dinoc->di_size) < 0) { + if (!uncertain) + do_warn(_("bad (negative) size %lld on inode %llu\n"), + be64_to_cpu(dinoc->di_size), lino); + if (verify_mode) + return 1; + goto clear_bad_out; + } + + /* + * if not in verify mode, check to sii if the inode and imap + * agree that the inode is free + */ + if (!verify_mode && di_mode =3D=3D 0) { + /* + * was_free value is not meaningful if we're in verify mode + */ + if (was_free) { + /* + * easy case, inode free -- inode and map agree, clear + * it just in case to ensure that format, etc. are + * set correctly + */ + if (!no_modify) + *dirty +=3D clear_dinode(mp, dino, lino); + *used =3D is_free; + return 0; + } + /* + * the inode looks free but the map says it's in use. + * clear the inode just to be safe and mark the inode + * free. + */ + do_warn(_("imap claims a free inode %llu is in use, "), lino); + if (!no_modify) { + do_warn(_("correcting imap and clearing inode\n")); + if (clear_dinode(mp, dino, lino)) { + retval =3D 1; + *dirty =3D 1; } + } else + do_warn(_("would correct imap and clear inode\n")); + *used =3D is_free; + return retval; + } =20 - if (no_modify && err !=3D 0) { - *cleared =3D 1; - *used =3D is_free; - *isa_dir =3D 0; - blkmap_free(dblkmap); - blkmap_free(ablkmap); - return(1); - } + /* + * because of the lack of any write ordering guarantee, it's + * possible that the core got updated but the forks didn't. + * so rather than be ambitious (and probably incorrect), + * if there's an inconsistency, we get conservative and + * just pitch the file. blow off checking formats of + * free inodes since technically any format is legal + * as we reset the inode when we re-use it. + */ + if (di_mode !=3D 0 && check_dinode_mode_format(dinoc) !=3D 0) { + if (!uncertain) + do_warn(_("bad inode format in inode %llu\n"), lino); + if (verify_mode) + return 1; + goto clear_bad_out; + } =20 - ASSERT(err =3D=3D 0); - } + if (verify_mode) + return retval; =20 - /* - * do attribute semantic-based consistency checks now - */ + /* + * clear the next unlinked field if necessary on a good + * inode only during phase 4 -- when checking for inodes + * referencing duplicate blocks. then it's safe because + * we've done the inode discovery and have found all the inodes + * we're going to find. check_dups is set to 1 only during + * phase 4. Ugly. + */ + if (check_dups && !no_modify) + *dirty +=3D clear_dinode_unlinked(mp, dino); =20 - /* get this only in phase 3, not in both phase 3 and 4 */ - if (extra_attr_check) { - if ((err =3D process_attributes(mp, lino, dino, ablkmap, - &repair))) { - do_warn( - _("problem with attribute contents in inode %llu\n"), lino); - if(!repair) { - /* clear attributes if not done already */ - if (!no_modify) { - *dirty +=3D clear_dinode_attr( - mp, dino, lino); - dinoc->di_aformat =3D - XFS_DINODE_FMT_LOCAL; - } else { - do_warn( - _("would clear attr fork\n")); - } - atotblocks =3D 0; - anextents =3D 0; - } - else { - *dirty =3D 1; /* it's been repaired */ - } - } - } - blkmap_free(ablkmap); + /* set type and map type info */ =20 - } else - anextents =3D 0; + switch (di_mode & S_IFMT) { + case S_IFDIR: + type =3D XR_INO_DIR; + *isa_dir =3D 1; + break; + case S_IFREG: + if (be16_to_cpu(dinoc->di_flags) & XFS_DIFLAG_REALTIME) + type =3D XR_INO_RTDATA; + else if (lino =3D=3D mp->m_sb.sb_rbmino) + type =3D XR_INO_RTBITMAP; + else if (lino =3D=3D mp->m_sb.sb_rsumino) + type =3D XR_INO_RTSUM; + else + type =3D XR_INO_DATA; + break; + case S_IFLNK: + type =3D XR_INO_SYMLINK; + break; + case S_IFCHR: + type =3D XR_INO_CHRDEV; + break; + case S_IFBLK: + type =3D XR_INO_BLKDEV; + break; + case S_IFSOCK: + type =3D XR_INO_SOCK; + break; + case S_IFIFO: + type =3D XR_INO_FIFO; + break; + default: + do_warn(_("bad inode type %#o inode %llu\n"), + di_mode & S_IFMT, lino); + goto clear_bad_out; + } =20 /* - * enforce totblocks is 0 for misc types - */ - if (process_misc_ino_types_blocks(totblocks, lino, type)) { - if (!no_modify) { - *dirty +=3D clear_dinode(mp, dino, lino); - ASSERT(*dirty > 0); - } - *cleared =3D 1; - *used =3D is_free; - *isa_dir =3D 0; - blkmap_free(dblkmap); - return(1); - } + * type checks for superblock inodes + */ + if (process_check_sb_inodes(mp, dinoc, lino, &type, dirty) !=3D 0) + goto clear_bad_out; =20 /* - * correct space counters if required + * only regular files with REALTIME or EXTSIZE flags set can have + * extsize set, or directories with EXTSZINHERIT. */ - if (totblocks + atotblocks !=3D INT_GET(dinoc->di_nblocks, ARCH_CONVERT))= { - if (!no_modify) { - do_warn( - _("correcting nblocks for inode %llu, was %llu - counted %llu\n"), - lino, INT_GET(dinoc->di_nblocks, ARCH_CONVERT), - totblocks + atotblocks); - *dirty =3D 1; - INT_SET(dinoc->di_nblocks, ARCH_CONVERT, totblocks + atotblocks); - } else { - do_warn( - _("bad nblocks %llu for inode %llu, would reset to %llu\n"), - INT_GET(dinoc->di_nblocks, ARCH_CONVERT), lino, - totblocks + atotblocks); + if (dinoc->di_extsize !=3D 0) { + if ((type =3D=3D XR_INO_RTDATA) || + (type =3D=3D XR_INO_DIR && (be16_to_cpu(dinoc->di_flags) & + XFS_DIFLAG_EXTSZINHERIT)) || + (type =3D=3D XR_INO_DATA && (be16_to_cpu(dinoc->di_flags) & + XFS_DIFLAG_EXTSIZE))) { + /* s'okay */ ; + } else { + do_warn(_("bad non-zero extent size %u for " + "non-realtime/extsize inode %llu, "), + be32_to_cpu(dinoc->di_extsize), lino); + if (!no_modify) { + do_warn(_("resetting to zero\n")); + dinoc->di_extsize =3D 0; + *dirty =3D 1; + } else + do_warn(_("would reset to zero\n")); } } =20 - if (nextents > MAXEXTNUM) { - do_warn(_("too many data fork extents (%llu) in inode %llu\n"), - nextents, lino); + /* + * general size/consistency checks: + */ + if (process_check_inode_sizes(mp, dino, lino, type) !=3D 0) + goto clear_bad_out; =20 - if (!no_modify) { - *dirty +=3D clear_dinode(mp, dino, lino); - ASSERT(*dirty > 0); - } - *cleared =3D 1; - *used =3D is_free; - *isa_dir =3D 0; - blkmap_free(dblkmap); + /* + * check for illegal values of forkoff + */ + if (process_check_inode_forkoff(mp, dinoc, lino) !=3D 0) + goto clear_bad_out; =20 - return(1); - } - if (nextents !=3D INT_GET(dinoc->di_nextents, ARCH_CONVERT)) { - if (!no_modify) { - do_warn( - _("correcting nextents for inode %llu, was %d - counted %llu\n"), - lino, INT_GET(dinoc->di_nextents, ARCH_CONVERT), - nextents); - *dirty =3D 1; - INT_SET(dinoc->di_nextents, ARCH_CONVERT, - (xfs_extnum_t) nextents); - } else { - do_warn( - _("bad nextents %d for inode %llu, would reset to %llu\n"), - INT_GET(dinoc->di_nextents, ARCH_CONVERT), - lino, nextents); - } - } + /* + * check data fork -- if it's bad, clear the inode + */ + if (process_inode_data_fork(mp, agno, ino, dino, type, dirty, + &totblocks, &nextents, &dblkmap, check_dups) !=3D 0) + goto bad_out; =20 - if (anextents > MAXAEXTNUM) { - do_warn(_("too many attr fork extents (%llu) in inode %llu\n"), - anextents, lino); + /* + * check attribute fork if necessary. attributes are + * always stored in the regular filesystem. + */ + if (process_inode_attr_fork(mp, agno, ino, dino, type, dirty, + &atotblocks, &anextents, check_dups, extra_attr_check, + &retval)) + goto bad_out; =20 - if (!no_modify) { - *dirty +=3D clear_dinode(mp, dino, lino); - ASSERT(*dirty > 0); - } - *cleared =3D 1; - *used =3D is_free; - *isa_dir =3D 0; - blkmap_free(dblkmap); - return(1); - } - if (anextents !=3D INT_GET(dinoc->di_anextents, ARCH_CONVERT)) { - if (!no_modify) { - do_warn( - _("correcting anextents for inode %llu, was %d - counted %llu\n"), - lino, - INT_GET(dinoc->di_anextents, ARCH_CONVERT), - anextents); - *dirty =3D 1; - INT_SET(dinoc->di_anextents, ARCH_CONVERT, - (xfs_aextnum_t) anextents); - } else { - do_warn( - _("bad anextents %d for inode %llu, would reset to %llu\n"), - INT_GET(dinoc->di_anextents, ARCH_CONVERT), - lino, anextents); - } - } + /* + * enforce totblocks is 0 for misc types + */ + if (process_misc_ino_types_blocks(totblocks, lino, type)) + goto clear_bad_out; + + /* + * correct space counters if required + */ + if (process_inode_blocks_and_extents(dinoc, totblocks + atotblocks, + nextents, anextents, lino, dirty) !=3D 0) + goto clear_bad_out; =20 /* * do any semantic type-based checking here */ switch (type) { case XR_INO_DIR: - if (XFS_SB_VERSION_HASDIRV2(&mp->m_sb)) - err =3D process_dir2(mp, lino, dino, ino_discovery, - dirty, "", parent, dblkmap); - else - err =3D process_dir(mp, lino, dino, ino_discovery, - dirty, "", parent, dblkmap); - if (err) - do_warn( - _("problem with directory contents in inode %llu\n"), - lino); - break; - case XR_INO_RTBITMAP: - /* process_rtbitmap XXX */ - err =3D 0; - break; - case XR_INO_RTSUM: - /* process_rtsummary XXX */ - err =3D 0; + if (process_dir2(mp, lino, dino, ino_discovery, dirty, "", + parent, dblkmap) !=3D 0) { + do_warn(_("problem with directory contents in " + "inode %llu\n"), lino); + goto clear_bad_out; + } break; case XR_INO_SYMLINK: - if ((err =3D process_symlink(mp, lino, dino, dblkmap))) + if (process_symlink(mp, lino, dino, dblkmap) !=3D 0) { do_warn(_("problem with symbolic link in inode %llu\n"), lino); - break; - case XR_INO_DATA: /* fall through to FIFO case ... */ - case XR_INO_RTDATA: /* fall through to FIFO case ... */ - case XR_INO_CHRDEV: /* fall through to FIFO case ... */ - case XR_INO_BLKDEV: /* fall through to FIFO case ... */ - case XR_INO_SOCK: /* fall through to FIFO case ... */ - case XR_INO_FIFO: - err =3D 0; + goto clear_bad_out; + } break; default: - printf(_("Unexpected inode type\n")); - abort(); + break; } =20 - if (dblkmap) - blkmap_free(dblkmap); - - if (err) { - /* - * problem in the inode type-specific semantic - * checking, clear out the inode and get out - */ - if (!no_modify) { - *dirty +=3D clear_dinode(mp, dino, lino); - ASSERT(*dirty > 0); - } - *cleared =3D 1; - *used =3D is_free; - *isa_dir =3D 0; - - return(1); - } + blkmap_free(dblkmap); =20 /* * check nlinks feature, if it's a version 1 inode, * just leave nlinks alone. even if it's set wrong, * it'll be reset when read in. */ - if (dinoc->di_version > XFS_DINODE_VERSION_1 && !fs_inode_nlink) { - /* - * do we have a fs/inode version mismatch with a valid - * version 2 inode here that has to stay version 2 or - * lose links? - */ - if (INT_GET(dinoc->di_nlink, ARCH_CONVERT) > XFS_MAXLINK_1) { - /* - * yes. are nlink inodes allowed? - */ - if (fs_inode_nlink_allowed) { - /* - * yes, update status variable which will - * cause sb to be updated later. - */ - fs_inode_nlink =3D 1; - do_warn( - _("version 2 inode %llu claims > %u links, "), - lino, XFS_MAXLINK_1); - if (!no_modify) { - do_warn( - _("updating superblock version number\n")); - } else { - do_warn( - _("would update superblock version number\n")); - } - } else { - /* - * no, have to convert back to onlinks - * even if we lose some links - */ - do_warn( - _("WARNING: version 2 inode %llu claims > %u links, "), - lino, XFS_MAXLINK_1); - if (!no_modify) { - do_warn( - _("converting back to version 1,\n\tthis may destroy %d links\n"), - INT_GET(dinoc->di_nlink, - ARCH_CONVERT) - - XFS_MAXLINK_1); - - dinoc->di_version =3D - XFS_DINODE_VERSION_1; - INT_SET(dinoc->di_nlink, ARCH_CONVERT, - XFS_MAXLINK_1); - INT_SET(dinoc->di_onlink, ARCH_CONVERT, - XFS_MAXLINK_1); - - *dirty =3D 1; - } else { - do_warn( - _("would convert back to version 1,\n\tthis might destroy %d links\n"), - INT_GET(dinoc->di_nlink, - ARCH_CONVERT) - - XFS_MAXLINK_1); - } - } - } else { - /* - * do we have a v2 inode that we could convert back - * to v1 without losing any links? if we do and - * we have a mismatch between superblock bits and the - * version bit, alter the version bit in this case. - * - * the case where we lost links was handled above. - */ - do_warn(_("found version 2 inode %llu, "), lino); - if (!no_modify) { - do_warn(_("converting back to version 1\n")); - - dinoc->di_version =3D - XFS_DINODE_VERSION_1; - INT_SET(dinoc->di_onlink, ARCH_CONVERT, - INT_GET(dinoc->di_nlink, ARCH_CONVERT)); - - *dirty =3D 1; - } else { - do_warn(_("would convert back to version 1\n")); - } - } - } + *dirty =3D process_check_inode_nlink_version(dinoc, lino); =20 - /* - * ok, if it's still a version 2 inode, it's going - * to stay a version 2 inode. it should have a zero - * onlink field, so clear it. - */ - if (dinoc->di_version > XFS_DINODE_VERSION_1 && - INT_GET(dinoc->di_onlink, ARCH_CONVERT) > 0 && - fs_inode_nlink > 0) { - if (!no_modify) { - do_warn( -_("clearing obsolete nlink field in version 2 inode %llu, was %d, now 0\n"= ), - lino, INT_GET(dinoc->di_onlink, ARCH_CONVERT)); - dinoc->di_onlink =3D 0; - *dirty =3D 1; - } else { - do_warn( -_("would clear obsolete nlink field in version 2 inode %llu, currently %d\= n"), - lino, INT_GET(dinoc->di_onlink, ARCH_CONVERT)); - *dirty =3D 1; - } - } + return retval; =20 - return(retval > 0 ? 1 : 0); +clear_bad_out: + if (!no_modify) { + *dirty +=3D clear_dinode(mp, dino, lino); + ASSERT(*dirty > 0); + } +bad_out: + *used =3D is_free; + *isa_dir =3D 0; + blkmap_free(dblkmap); + return 1; } =20 /* @@ -2983,8 +2831,6 @@ _("would clear obsolete nlink field in v * claimed blocks using the bitmap. * Outs: * dirty -- whether we changed the inode (1 =3D=3D yes) - * cleared -- whether we cleared the inode (1 =3D=3D yes). In - * no modify mode, if we would have cleared it * used -- 1 if the inode is used, 0 if free. In no modify * mode, whether the inode should be used or free * isa_dir -- 1 if the inode is a directory, 0 if not. In @@ -2994,30 +2840,29 @@ _("would clear obsolete nlink field in v */ =20 int -process_dinode(xfs_mount_t *mp, - xfs_dinode_t *dino, - xfs_agnumber_t agno, - xfs_agino_t ino, - int was_free, - int *dirty, - int *cleared, - int *used, - int ino_discovery, - int check_dups, - int extra_attr_check, - int *isa_dir, - xfs_ino_t *parent) +process_dinode( + xfs_mount_t *mp, + xfs_dinode_t *dino, + xfs_agnumber_t agno, + xfs_agino_t ino, + int was_free, + int *dirty, + int *used, + int ino_discovery, + int check_dups, + int extra_attr_check, + int *isa_dir, + xfs_ino_t *parent) { - const int verify_mode =3D 0; - const int uncertain =3D 0; + const int verify_mode =3D 0; + const int uncertain =3D 0; =20 #ifdef XR_INODE_TRACE fprintf(stderr, "processing inode %d/%d\n", agno, ino); #endif - return(process_dinode_int(mp, dino, agno, ino, was_free, dirty, - cleared, used, verify_mode, uncertain, - ino_discovery, check_dups, extra_attr_check, - isa_dir, parent)); + return process_dinode_int(mp, dino, agno, ino, was_free, dirty, used, + verify_mode, uncertain, ino_discovery, + check_dups, extra_attr_check, isa_dir, parent); } =20 /* @@ -3027,25 +2872,24 @@ process_dinode(xfs_mount_t *mp, * if the inode passes the cursory sanity check, 1 otherwise. */ int -verify_dinode(xfs_mount_t *mp, - xfs_dinode_t *dino, - xfs_agnumber_t agno, - xfs_agino_t ino) -{ - xfs_ino_t parent; - int cleared =3D 0; - int used =3D 0; - int dirty =3D 0; - int isa_dir =3D 0; - const int verify_mode =3D 1; - const int check_dups =3D 0; - const int ino_discovery =3D 0; - const int uncertain =3D 0; - - return(process_dinode_int(mp, dino, agno, ino, 0, &dirty, - &cleared, &used, verify_mode, - uncertain, ino_discovery, check_dups, - 0, &isa_dir, &parent)); +verify_dinode( + xfs_mount_t *mp, + xfs_dinode_t *dino, + xfs_agnumber_t agno, + xfs_agino_t ino) +{ + xfs_ino_t parent; + int used =3D 0; + int dirty =3D 0; + int isa_dir =3D 0; + const int verify_mode =3D 1; + const int check_dups =3D 0; + const int ino_discovery =3D 0; + const int uncertain =3D 0; + + return process_dinode_int(mp, dino, agno, ino, 0, &dirty, &used, + verify_mode, uncertain, ino_discovery, + check_dups, 0, &isa_dir, &parent); } =20 /* @@ -3054,23 +2898,22 @@ verify_dinode(xfs_mount_t *mp, * returns 0 if the inode passes the cursory sanity check, 1 otherwise. */ int -verify_uncertain_dinode(xfs_mount_t *mp, - xfs_dinode_t *dino, - xfs_agnumber_t agno, - xfs_agino_t ino) -{ - xfs_ino_t parent; - int cleared =3D 0; - int used =3D 0; - int dirty =3D 0; - int isa_dir =3D 0; - const int verify_mode =3D 1; - const int check_dups =3D 0; - const int ino_discovery =3D 0; - const int uncertain =3D 1; - - return(process_dinode_int(mp, dino, agno, ino, 0, &dirty, - &cleared, &used, verify_mode, - uncertain, ino_discovery, check_dups, - 0, &isa_dir, &parent)); +verify_uncertain_dinode( + xfs_mount_t *mp, + xfs_dinode_t *dino, + xfs_agnumber_t agno, + xfs_agino_t ino) +{ + xfs_ino_t parent; + int used =3D 0; + int dirty =3D 0; + int isa_dir =3D 0; + const int verify_mode =3D 1; + const int check_dups =3D 0; + const int ino_discovery =3D 0; + const int uncertain =3D 1; + + return process_dinode_int(mp, dino, agno, ino, 0, &dirty, &used, + verify_mode, uncertain, ino_discovery, + check_dups, 0, &isa_dir, &parent); } =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D xfsprogs/repair/dinode.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- a/xfsprogs/repair/dinode.h 2007-11-15 17:24:33.000000000 +1100 +++ b/xfsprogs/repair/dinode.h 2007-11-15 17:22:18.290409320 +1100 @@ -84,7 +84,6 @@ process_dinode(xfs_mount_t *mp, xfs_agino_t ino, int was_free, int *dirty, - int *tossit, int *used, int check_dirs, int check_dups, ------------LgAlWvtlkAZ8Uw20CyzBdz-- From owner-xfs@oss.sgi.com Wed Nov 14 23:51:49 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 14 Nov 2007 23:51:57 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from mail.g-house.de (ns2.g-housing.de [81.169.133.75]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAF7pjkF008793 for ; Wed, 14 Nov 2007 23:51:49 -0800 Received: from [89.59.23.42] (helo=[192.168.178.25]) by mail.g-house.de with esmtpsa (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.63) (envelope-from ) id 1IsZYI-0003Ab-9N; Thu, 15 Nov 2007 08:54:19 +0100 Date: Thu, 15 Nov 2007 08:51:36 +0100 (CET) From: Christian Kujau X-X-Sender: evil@sheep.housecafe.de To: LKML cc: "J. Bruce Fields" , Benny Halevy , Chris Wedgwood , linux-xfs@oss.sgi.com Subject: Re: 2.6.24-rc2 XFS nfsd hang / smbd too In-Reply-To: Message-ID: References: <20071114070400.GA25708@puku.stupidest.org> <473AA72C.6020308@panasas.com> <20071114125907.GB4010@fieldses.org> User-Agent: Alpine 0.99999 (DEB 796 2007-11-08) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: ClamAV 0.91.2/4796/Wed Nov 14 16:09:59 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13674 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lists@nerdbynature.de Precedence: bulk X-list: xfs On Wed, 14 Nov 2007, Christian Kujau wrote: > Yes, the nfsd process only got stuck when I did ls(1) (with or without -l) on > a NFS share which contained a XFS partition. Since NFS was not working (the nfsd processes were already in D state), to mount a CIFS share from the very same server (and the same client). I'm exporting the same /data share (JFS), but, since it's smbd I don't have to export every single submount (as it is with NFS): * with NFS: server:/data (jfs) server:/data/sub (xfs) * with CIFS: server:/data (containing both the jfs and the xfs partition as one single share to mount) Upon accessing the /data/sub part of the CIFS share, the client hung, waiting for the server to respond (the [cifs] kernel thread on the client was spinning, waiting for i/o). On the server, similar things as with the nfsd processes happened (although I know that the smbd (Samba) processes are running completely in userspace): http://nerdbynature.de/bits/2.6.24-rc2/nfsd/debug.3.txt.gz Sysrq-t again on the server: http://nerdbynature.de/bits/2.6.24-rc2/nfsd/dmesg.3.gz smbd D c04131c0 0 22782 3039 e242ad60 00000046 e242a000 c04131c0 00000001 e7875264 00000246 e7f88a80 e242ada8 c040914c 00000000 00000002 c016dc64 e7a3b7b8 e242a000 e7875284 00000000 c016dc64 f7343d88 f6337e90 e7f88a80 e7875264 e242ad88 e7a3b7b8 Call Trace: [] mutex_lock_nested+0xcc/0x2c0 [] do_lookup+0xa4/0x190 [] __link_path_walk+0x749/0xd10 [] link_path_walk+0x44/0xc0 [] path_walk+0x18/0x20 [] do_path_lookup+0x78/0x1c0 [] __user_walk_fd+0x38/0x60 [] vfs_stat_fd+0x21/0x50 [] vfs_stat+0x11/0x20 [] sys_stat64+0x14/0x30 [] sysenter_past_esp+0x5f/0xa5 ======================= So, it's really not NFS but ?FS related? Christian. -- BOFH excuse #199: the curls in your keyboard cord are losing electricity. From owner-xfs@oss.sgi.com Thu Nov 15 06:44:25 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 15 Nov 2007 06:44:30 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from mail.g-house.de (ns2.g-housing.de [81.169.133.75]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAFEiM27009164 for ; Thu, 15 Nov 2007 06:44:24 -0800 Received: from [89.59.23.42] (helo=sheep.housecafe.de) by mail.g-house.de with esmtpsa (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.63) (envelope-from ) id 1Isfze-00042T-4X; Thu, 15 Nov 2007 15:46:58 +0100 Received: from localhost ([127.0.0.1] helo=housecafe.dyndns.org) by sheep.housecafe.de with esmtp (Exim 4.68) (envelope-from ) id 1Isfx2-0005eu-Oh; Thu, 15 Nov 2007 15:44:16 +0100 Received: from 62.180.231.196 (SquirrelMail authenticated user evil) by housecafe.dyndns.org with HTTP; Thu, 15 Nov 2007 15:44:16 +0100 (CET) Message-ID: <23908.62.180.231.196.1195137856.squirrel@housecafe.dyndns.org> In-Reply-To: References: <20071114070400.GA25708@puku.stupidest.org> <473AA72C.6020308@panasas.com> <20071114125907.GB4010@fieldses.org> Date: Thu, 15 Nov 2007 15:44:16 +0100 (CET) Subject: Re: 2.6.24-rc2 XFS nfsd hang / smbd too From: "Christian Kujau" To: "LKML" Cc: "J. Bruce Fields" , "Benny Halevy" , "Chris Wedgwood" , linux-xfs@oss.sgi.com User-Agent: SquirrelMail/1.5.2 [SVN] MIME-Version: 1.0 Content-Type: text/plain;charset=ISO-8859-15 Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV 0.91.2/4799/Thu Nov 15 04:36:46 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13675 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lists@nerdbynature.de Precedence: bulk X-list: xfs On Thu, November 15, 2007 08:51, Christian Kujau wrote: > Since NFS was not working (the nfsd processes were already in D state), > to mount a CIFS share from the very same server (and the same client). That should read: Since NFS was not working (the nfsd processes were already in D state), I decided to mount a CIFS share from the very same server (and the same client). [...] C. -- BOFH excuse #442: Trojan horse ran out of hay From owner-xfs@oss.sgi.com Thu Nov 15 14:02:07 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 15 Nov 2007 14:02:12 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from mail.g-house.de (ns2.g-housing.de [81.169.133.75]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAFM26u7012315 for ; Thu, 15 Nov 2007 14:02:07 -0800 Received: from [89.59.23.42] (helo=[192.168.178.25]) by mail.g-house.de with esmtpsa (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.63) (envelope-from ) id 1IsmpH-0006ic-9V; Thu, 15 Nov 2007 23:04:44 +0100 Date: Thu, 15 Nov 2007 23:01:56 +0100 (CET) From: Christian Kujau X-X-Sender: evil@sheep.housecafe.de To: LKML cc: "J. Bruce Fields" , Benny Halevy , Chris Wedgwood , linux-xfs@oss.sgi.com Subject: Re: 2.6.24-rc2 XFS nfsd hang / smbd too In-Reply-To: Message-ID: References: <20071114070400.GA25708@puku.stupidest.org> <473AA72C.6020308@panasas.com> <20071114125907.GB4010@fieldses.org> User-Agent: Alpine 0.99999 (DEB 796 2007-11-08) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: ClamAV 0.91.2/4806/Thu Nov 15 11:47:22 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13676 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lists@nerdbynature.de Precedence: bulk X-list: xfs On Thu, 15 Nov 2007, Christian Kujau wrote: > Upon accessing the /data/sub part of the CIFS share, the client hung, waiting > for the server to respond (the [cifs] kernel thread on the client was > spinning, waiting for i/o). On the server, similar things as with the nfsd > processes happened Turns out that the CIFS only hung because the server was already stuck because of the nfsd/XFS issue. After rebooting the server, I was able to access the CIFS shares (the xfs partition too) just fine. Yes, the xfs partition itsself has been checked too and no errors were found. C. -- BOFH excuse #348: We're on Token Ring, and it looks like the token got loose. From owner-xfs@oss.sgi.com Thu Nov 15 16:34:07 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 15 Nov 2007 16:34:12 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from smtp124.sbc.mail.sp1.yahoo.com (smtp124.sbc.mail.sp1.yahoo.com [69.147.64.97]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAG0Y6gF006221 for ; Thu, 15 Nov 2007 16:34:07 -0800 Received: (qmail 68665 invoked from network); 16 Nov 2007 00:34:12 -0000 Received: from unknown (HELO stupidest.org) (cwedgwood@sbcglobal.net@24.5.75.45 with login) by smtp124.sbc.mail.sp1.yahoo.com with SMTP; 16 Nov 2007 00:34:12 -0000 X-YMail-OSG: 4S.Jt38VM1lltsBlXxZITep4vUf5AeeCYaA.OFGU5liUk26.yAAu0UTnA4Mf3rUYDwtTRPJubw-- Received: by tuatara.stupidest.org (Postfix, from userid 10000) id 73A3F28397FB; Thu, 15 Nov 2007 16:34:10 -0800 (PST) Date: Thu, 15 Nov 2007 16:34:10 -0800 From: Chris Wedgwood To: Christian Kujau Cc: LKML , "J. Bruce Fields" , Benny Halevy , linux-xfs@oss.sgi.com Subject: Re: 2.6.24-rc2 XFS nfsd hang / smbd too Message-ID: <20071116003410.GA16797@puku.stupidest.org> References: <20071114070400.GA25708@puku.stupidest.org> <473AA72C.6020308@panasas.com> <20071114125907.GB4010@fieldses.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Virus-Scanned: ClamAV 0.91.2/4807/Thu Nov 15 13:13:14 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13677 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: cw@f00f.org Precedence: bulk X-list: xfs On Thu, Nov 15, 2007 at 08:51:36AM +0100, Christian Kujau wrote: > [] mutex_lock_nested+0xcc/0x2c0 > [] do_lookup+0xa4/0x190 > [] __link_path_walk+0x749/0xd10 > [] link_path_walk+0x44/0xc0 > [] path_walk+0x18/0x20 > [] do_path_lookup+0x78/0x1c0 > [] __user_walk_fd+0x38/0x60 > [] vfs_stat_fd+0x21/0x50 > [] vfs_stat+0x11/0x20 > [] sys_stat64+0x14/0x30 > [] sysenter_past_esp+0x5f/0xa5 nfsd already wedged up and holds a lock, this is expected. I'm not sure what you're doing here, but a viable work-around for now might be to use nfsv2 mounts, something like mount -o vers=2 ... or to keep v3 and disable readdirplus doing something like: mount -o vers=3,nordirplus ... The later I didn't test but was suggested on #linuxfs. From owner-xfs@oss.sgi.com Thu Nov 15 16:43:18 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 15 Nov 2007 16:43:23 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAG0hCqf007513 for ; Thu, 15 Nov 2007 16:43:15 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA25783; Fri, 16 Nov 2007 11:43:12 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAG0hBdD105884310; Fri, 16 Nov 2007 11:43:12 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAG0hAL5107833583; Fri, 16 Nov 2007 11:43:10 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Fri, 16 Nov 2007 11:43:10 +1100 From: David Chinner To: Lachlan McIlroy Cc: David Chinner , xfs-oss , xfs-dev Subject: Re: [PATCH, RFC] Move AIL pushing into a separate thread Message-ID: <20071116004310.GL66820511@sgi.com> References: <20071105050706.GW66820511@sgi.com> <473BBDC1.2020107@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <473BBDC1.2020107@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4807/Thu Nov 15 13:13:14 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13678 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Thu, Nov 15, 2007 at 02:32:17PM +1100, Lachlan McIlroy wrote: > Overall it looks good Dave, just a few comments below. ..... > >+int > >+xfsaild( > >+ void *data) > >+{ > >+ xfs_mount_t *mp = (xfs_mount_t *)data; > >+ xfs_lsn_t last_pushed_lsn = 0; > >+ long tout = 0; > >+ > >+ while (!kthread_should_stop()) { > >+ if (tout) > >+ schedule_timeout_interruptible(msecs_to_jiffies(tout)); > >+ > >+ /* swsusp */ > >+ try_to_freeze(); > >+ > >+ /* we're either starting or stopping if there is no log */ > >+ if (!mp->m_log) > >+ continue; > It's looks like the log should never be NULL while the xfsaild > thread is running. Could we ASSERT(mp->m_log)? Already done. > >@@ -100,57 +97,105 @@ xfs_trans_push_ail( > > spin_unlock(&mp->m_ail_lock); > > return (xfs_lsn_t)0; > > } > >+ if (XFS_LSN_CMP(threshold_lsn, mp->m_ail.xa_target) > 0) > Is this conditional necessary? Can we just call xfsaild_wakeup() > and let it do the same thing? Already done. > >+long > >+xfsaild_push( > >+ xfs_mount_t *mp, > >+ xfs_lsn_t *last_lsn) > >+{ > >+ long tout = 100; /* milliseconds */ > >+ xfs_lsn_t last_pushed_lsn = *last_lsn; > >+ xfs_lsn_t target = mp->m_ail.xa_target; > >+ xfs_lsn_t lsn; > >+ xfs_log_item_t *lip; > >+ int lock_result; > >+ int gen; > >+ int restarts; > restarts needs to be initialised Already done. > >+ spin_lock(&mp->m_ail_lock); > >+ count++; > >+ /* Too many items we can't do anything with? */ > >+ if (stuck > 100) > 100? Arbitrary magic number or was there reason for this? Arbitrary magic number based on observation. basically, if we are skipping too many items because we can't flush them or they are already being flushed we back off and given them time to complete whatever operation is being done. i.e. remove pressure from the AIL while we can't make progress so traversals don't slow down further inserts and removæls to/from the AIL. > >+ break; > >+ /* we're either starting or stopping if there is no log */ > >+ if (!mp->m_log) > Again, can we ASSERT(mp->m_log)? Already done. > >+ if ((XFS_LSN_CMP(lsn, target) >= 0)) { > >+ tout += 20; > >+ last_pushed_lsn = 0; > >+ } else if ((restarts > XFS_TRANS_PUSH_AIL_RESTARTS) || > >+ (count && (count < (stuck + 10)))) { > If 0 < count < 10 and stuck == 0 then we'll think we couldn't flush > something > - not sure if that is what you intended here. If we've got to this situation it generally means we've got an empty AIL. Hence backing off a bit more won't hurt at all because the log is pretty much clean and we are not likely to be in a tail-pushing situation in the next few milliseconds. > Maybe ((count - stuck) < stuck) ? ie the number of items we successfully > flushed > is less than the number of items we couldn't flush then back off. Sort of, but that's a 50% rule - what I'm checking is more like a 90% stuck threshold when we break out of the loop at stuck == 100. If you can think of a better way of expressing that.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Thu Nov 15 17:21:03 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 15 Nov 2007 17:21:07 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAG1KuFE012192 for ; Thu, 15 Nov 2007 17:20:58 -0800 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA26751; Fri, 16 Nov 2007 12:20:55 +1100 Message-ID: <473CF018.6050806@sgi.com> Date: Fri, 16 Nov 2007 12:19:20 +1100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.9 (X11/20071031) MIME-Version: 1.0 To: David Chinner CC: xfs-oss , xfs-dev Subject: Re: [PATCH, RFC] Move AIL pushing into a separate thread References: <20071105050706.GW66820511@sgi.com> <473BBDC1.2020107@sgi.com> <20071116004310.GL66820511@sgi.com> In-Reply-To: <20071116004310.GL66820511@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV 0.91.2/4807/Thu Nov 15 13:13:14 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13679 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs David Chinner wrote: > On Thu, Nov 15, 2007 at 02:32:17PM +1100, Lachlan McIlroy wrote: >> Overall it looks good Dave, just a few comments below. > ..... >>> +int >>> +xfsaild( >>> + void *data) >>> +{ >>> + xfs_mount_t *mp = (xfs_mount_t *)data; >>> + xfs_lsn_t last_pushed_lsn = 0; >>> + long tout = 0; >>> + >>> + while (!kthread_should_stop()) { >>> + if (tout) >>> + schedule_timeout_interruptible(msecs_to_jiffies(tout)); >>> + >>> + /* swsusp */ >>> + try_to_freeze(); >>> + >>> + /* we're either starting or stopping if there is no log */ >>> + if (!mp->m_log) >>> + continue; >> It's looks like the log should never be NULL while the xfsaild >> thread is running. Could we ASSERT(mp->m_log)? > > Already done. > >>> @@ -100,57 +97,105 @@ xfs_trans_push_ail( >>> spin_unlock(&mp->m_ail_lock); >>> return (xfs_lsn_t)0; >>> } >>> + if (XFS_LSN_CMP(threshold_lsn, mp->m_ail.xa_target) > 0) >> Is this conditional necessary? Can we just call xfsaild_wakeup() >> and let it do the same thing? > > Already done. > >>> +long >>> +xfsaild_push( >>> + xfs_mount_t *mp, >>> + xfs_lsn_t *last_lsn) >>> +{ >>> + long tout = 100; /* milliseconds */ >>> + xfs_lsn_t last_pushed_lsn = *last_lsn; >>> + xfs_lsn_t target = mp->m_ail.xa_target; >>> + xfs_lsn_t lsn; >>> + xfs_log_item_t *lip; >>> + int lock_result; >>> + int gen; >>> + int restarts; >> restarts needs to be initialised > > Already done. > >>> + spin_lock(&mp->m_ail_lock); >>> + count++; >>> + /* Too many items we can't do anything with? */ >>> + if (stuck > 100) >> 100? Arbitrary magic number or was there reason for this? > > Arbitrary magic number based on observation. basically, if > we are skipping too many items because we can't flush them or > they are already being flushed we back off and given them time > to complete whatever operation is being done. i.e. remove pressure > from the AIL while we can't make progress so traversals don't > slow down further inserts and removæls to/from the AIL. > >>> + break; >>> + /* we're either starting or stopping if there is no log */ >>> + if (!mp->m_log) >> Again, can we ASSERT(mp->m_log)? > > Already done. > >>> + if ((XFS_LSN_CMP(lsn, target) >= 0)) { >>> + tout += 20; >>> + last_pushed_lsn = 0; >>> + } else if ((restarts > XFS_TRANS_PUSH_AIL_RESTARTS) || >>> + (count && (count < (stuck + 10)))) { >> If 0 < count < 10 and stuck == 0 then we'll think we couldn't flush >> something >> - not sure if that is what you intended here. > > If we've got to this situation it generally means we've > got an empty AIL. Hence backing off a bit more won't hurt at > all because the log is pretty much clean and we are not likely > to be in a tail-pushing situation in the next few milliseconds. Ah, yes, good point. > >> Maybe ((count - stuck) < stuck) ? ie the number of items we successfully >> flushed >> is less than the number of items we couldn't flush then back off. > > Sort of, but that's a 50% rule - what I'm checking is more like a > 90% stuck threshold when we break out of the loop at stuck == 100. > If you can think of a better way of expressing that.... something like ((stuck * 100)/count > 90) ? or adding a bias to the 50% rule, ((count - stuck) * 10 < stuck) From owner-xfs@oss.sgi.com Thu Nov 15 20:36:21 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 15 Nov 2007 20:36:24 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_66, T_STOX_BOUND_090909_B autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAG4aH3n003812 for ; Thu, 15 Nov 2007 20:36:19 -0800 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA02205; Fri, 16 Nov 2007 15:36:15 +1100 Message-ID: <473D1DE0.1090106@sgi.com> Date: Fri, 16 Nov 2007 15:34:40 +1100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.9 (X11/20071031) MIME-Version: 1.0 To: David Chinner CC: xfs-dev , xfs-oss Subject: Re: [PATCH] bulkstat fixups References: <4733EEF2.9010504@sgi.com> <20071111214759.GS995458@sgi.com> <4737C11D.8030007@sgi.com> <20071112041121.GT66820511@sgi.com> In-Reply-To: <20071112041121.GT66820511@sgi.com> Content-Type: multipart/mixed; boundary="------------050609080409070302010801" X-Virus-Scanned: ClamAV 0.91.2/4808/Thu Nov 15 16:53:47 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13680 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs This is a multi-part message in MIME format. --------------050609080409070302010801 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Updated patch - I added cond_resched() calls into each loop - for loops that have a 'continue' somewhere in them I added the cond_resched() at the start, otherwise I put it at the end. David Chinner wrote: > On Mon, Nov 12, 2007 at 01:57:33PM +1100, Lachlan McIlroy wrote: >> David Chinner wrote: >>> [Lachlan, can you wrap your email text at 72 columns for ease of quoting?] >>> >>> On Fri, Nov 09, 2007 at 04:24:02PM +1100, Lachlan McIlroy wrote: >>>> Here's a collection of fixups for bulkstat for all the remaining issues. >>>> >>>> - sanity check for NULL user buffer in xfs_ioc_bulkstat[_compat]() >>> OK. >>> >>>> - remove the special case for XFS_IOC_FSBULKSTAT with count == 1. This >>>> special >>>> case causes bulkstat to fail because the special case uses >>>> xfs_bulkstat_single() >>>> instead of xfs_bulkstat() and the two functions have different >>>> semantics. >>>> xfs_bulkstat() will return the next inode after the one supplied while >>>> skipping >>>> internal inodes (ie quota inodes). xfs_bulkstate_single() will only >>>> lookup the >>>> inode supplied and return an error if it is an internal inode. >>> Userspace visile change. What applications do we have that rely on this >>> behaviour that will be broken by this change? >> Any apps that rely on the existing behaviour are probably broken. If an app >> wants to call xfs_bulkstat_single() it should use XFS_IOC_FSBULKSTAT_SINGLE. > > Perhaps, but we can't arbitrarily decide that those apps will now break on > a new kernel with this change. At minimum we need to audit all of the code > we have that uses bulkstat for such breakage (including DMF!) before we make a > change like this. I've looked through everything we have in xfs-cmds and nothing relies on this bug being present. Vlad helped me with the DMF side - DMF does not use the XFS_IOC_FSBULKSTAT ioctl, it has it's own interface into the kernel which calls xfs_bulkstat() directly so it wont be affected by this change. > >>>> - checks against 'ubleft' (the space left in the user's buffer) should be >>>> against >>>> 'statstruct_size' which is the supplied minimum object size. The >>>> mixture of >>>> checks against statstruct_size and 0 was one of the reasons we were >>>> skipping >>>> inodes. >>> Can you wrap these checks in a static inline function so that it is obvious >>> what the correct way to check is and we don't reintroduce this porblem? >>> i.e. >>> >>> static inline int >>> xfs_bulkstat_ubuffer_large_enough(ssize_t space) >>> { >>> return (space > sizeof(struct blah)); >>> } >>> >>> That will also remove a stack variable.... >> That won't work - statstruct_size is passed into xfs_bulkstat() so we don't >> know what 'blah' is. Maybe a macro would be easier. >> >> #define XFS_BULKSTAT_UBLEFT (ubleft >= statstruct_size) > > Yeah, something like that, but I don't like macros with no parameters used > like that.... > >>> FWIW - missing from this set of patches - cpu_relax() in the loops. In the >>> case >>> where no I/O is required to do the scan, we can hold the cpu for a long >>> time >>> and that will hold off I/O completion, etc for the cpu bulkstat is running >>> on. >>> Hence after every cluster we scan we should cpu_relax() to allow other >>> processes cpu time on that cpu. >>> >> I don't get how cpu_relax() works. I see that it is called at times with a >> spinlock held so it wont trigger a context switch. Does it give interrupts >> a chance to run? > > Sorry, my mistake - confused cpu_relax() with cond_resched(). take the above > paragraph and s/cpu_relax/cond_resched/g > >> It appears to be used where a minor delay is needed - I don't think we have >> any >> cases in xfs_bulkstat() where we need to wait for an event that isn't I/O. > > The issue is when we're hitting cached buffers and we never end up waiting > for I/O - we will then monopolise the cpu we are running on and hold off > all other processing. It's antisocial and leads to high latencies for other > code. > > Cheers, > > Dave. --------------050609080409070302010801 Content-Type: text/x-patch; name="bulkstat.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="bulkstat.diff" --- fs/xfs/xfs_itable.c_1.157 2007-10-25 17:22:09.000000000 +1000 +++ fs/xfs/xfs_itable.c 2007-11-16 14:52:31.000000000 +1100 @@ -316,6 +316,8 @@ xfs_bulkstat_use_dinode( return 1; } +#define XFS_BULKSTAT_UBLEFT(ubleft) ((ubleft) >= statstruct_size) + /* * Return stat information in bulk (by-inode) for the filesystem. */ @@ -353,7 +355,7 @@ xfs_bulkstat( xfs_inobt_rec_incore_t *irbp; /* current irec buffer pointer */ xfs_inobt_rec_incore_t *irbuf; /* start of irec buffer */ xfs_inobt_rec_incore_t *irbufend; /* end of good irec buffer entries */ - xfs_ino_t lastino=0; /* last inode number returned */ + xfs_ino_t lastino; /* last inode number returned */ int nbcluster; /* # of blocks in a cluster */ int nicluster; /* # of inodes in a cluster */ int nimask; /* mask for inode clusters */ @@ -373,6 +375,7 @@ xfs_bulkstat( * Get the last inode value, see if there's nothing to do. */ ino = (xfs_ino_t)*lastinop; + lastino = ino; dip = NULL; agno = XFS_INO_TO_AGNO(mp, ino); agino = XFS_INO_TO_AGINO(mp, ino); @@ -382,6 +385,9 @@ xfs_bulkstat( *ubcountp = 0; return 0; } + if (!ubcountp || *ubcountp <= 0) { + return EINVAL; + } ubcount = *ubcountp; /* statstruct's */ ubleft = ubcount * statstruct_size; /* bytes */ *ubcountp = ubelem = 0; @@ -402,7 +408,8 @@ xfs_bulkstat( * inode returned; 0 means start of the allocation group. */ rval = 0; - while (ubleft >= statstruct_size && agno < mp->m_sb.sb_agcount) { + while (XFS_BULKSTAT_UBLEFT(ubleft) && agno < mp->m_sb.sb_agcount) { + cond_resched(); bp = NULL; down_read(&mp->m_peraglock); error = xfs_ialloc_read_agi(mp, NULL, agno, &agbp); @@ -499,6 +506,7 @@ xfs_bulkstat( break; error = xfs_inobt_lookup_ge(cur, agino, 0, 0, &tmp); + cond_resched(); } /* * If ran off the end of the ag either with an error, @@ -542,6 +550,7 @@ xfs_bulkstat( */ agino = gino + XFS_INODES_PER_CHUNK; error = xfs_inobt_increment(cur, 0, &tmp); + cond_resched(); } /* * Drop the btree buffers and the agi buffer. @@ -555,15 +564,16 @@ xfs_bulkstat( */ irbufend = irbp; for (irbp = irbuf; - irbp < irbufend && ubleft >= statstruct_size; irbp++) { + irbp < irbufend && XFS_BULKSTAT_UBLEFT(ubleft); irbp++) { /* * Now process this chunk of inodes. */ for (agino = irbp->ir_startino, chunkidx = clustidx = 0; - ubleft > 0 && + XFS_BULKSTAT_UBLEFT(ubleft) && irbp->ir_freecount < XFS_INODES_PER_CHUNK; chunkidx++, clustidx++, agino++) { ASSERT(chunkidx < XFS_INODES_PER_CHUNK); + cond_resched(); /* * Recompute agbno if this is the * first inode of the cluster. @@ -663,15 +673,13 @@ xfs_bulkstat( ubleft, private_data, bno, &ubused, dip, &fmterror); if (fmterror == BULKSTAT_RV_NOTHING) { - if (error == EFAULT) { - ubleft = 0; - rval = error; - break; - } - else if (error == ENOMEM) + if (error && error != ENOENT && + error != EINVAL) { ubleft = 0; - else - lastino = ino; + rval = error; + break; + } + lastino = ino; continue; } if (fmterror == BULKSTAT_RV_GIVEUP) { @@ -686,6 +694,8 @@ xfs_bulkstat( ubelem++; lastino = ino; } + + cond_resched(); } if (bp) @@ -694,11 +704,12 @@ xfs_bulkstat( /* * Set up for the next loop iteration. */ - if (ubleft > 0) { + if (XFS_BULKSTAT_UBLEFT(ubleft)) { if (end_of_ag) { agno++; agino = 0; - } + } else + agino = XFS_INO_TO_AGINO(mp, lastino); } else break; } @@ -707,6 +718,11 @@ xfs_bulkstat( */ kmem_free(irbuf, irbsize); *ubcountp = ubelem; + /* + * Found some inodes, return them now and return the error next time. + */ + if (ubelem) + rval = 0; if (agno >= mp->m_sb.sb_agcount) { /* * If we ran out of filesystem, mark lastino as off --------------050609080409070302010801-- From owner-xfs@oss.sgi.com Thu Nov 15 20:44:33 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 15 Nov 2007 20:44:37 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_66 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAG4iS2L005005 for ; Thu, 15 Nov 2007 20:44:31 -0800 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA02369; Fri, 16 Nov 2007 15:44:29 +1100 Message-ID: <473D1FCE.8030705@sgi.com> Date: Fri, 16 Nov 2007 15:42:54 +1100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.9 (X11/20071031) MIME-Version: 1.0 To: lachlan@sgi.com CC: David Chinner , xfs-dev , xfs-oss Subject: Re: [PATCH] bulkstat fixups References: <4733EEF2.9010504@sgi.com> <20071111214759.GS995458@sgi.com> <4737C11D.8030007@sgi.com> <20071112041121.GT66820511@sgi.com> <473D1DE0.1090106@sgi.com> In-Reply-To: <473D1DE0.1090106@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4808/Thu Nov 15 16:53:47 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13681 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs Forgot to mention - this patch is just fs/xfs/xfs_itable.c. That's the only file that has been updated since the last patch. Lachlan McIlroy wrote: > Updated patch - I added cond_resched() calls into each loop - for loops > that > have a 'continue' somewhere in them I added the cond_resched() at the > start, > otherwise I put it at the end. > > David Chinner wrote: >> On Mon, Nov 12, 2007 at 01:57:33PM +1100, Lachlan McIlroy wrote: >>> David Chinner wrote: >>>> [Lachlan, can you wrap your email text at 72 columns for ease of >>>> quoting?] >>>> >>>> On Fri, Nov 09, 2007 at 04:24:02PM +1100, Lachlan McIlroy wrote: >>>>> Here's a collection of fixups for bulkstat for all the remaining >>>>> issues. >>>>> >>>>> - sanity check for NULL user buffer in xfs_ioc_bulkstat[_compat]() >>>> OK. >>>> >>>>> - remove the special case for XFS_IOC_FSBULKSTAT with count == 1. >>>>> This special >>>>> case causes bulkstat to fail because the special case uses >>>>> xfs_bulkstat_single() >>>>> instead of xfs_bulkstat() and the two functions have different >>>>> semantics. >>>>> xfs_bulkstat() will return the next inode after the one supplied >>>>> while skipping >>>>> internal inodes (ie quota inodes). xfs_bulkstate_single() will >>>>> only lookup the >>>>> inode supplied and return an error if it is an internal inode. >>>> Userspace visile change. What applications do we have that rely on this >>>> behaviour that will be broken by this change? >>> Any apps that rely on the existing behaviour are probably broken. If >>> an app >>> wants to call xfs_bulkstat_single() it should use >>> XFS_IOC_FSBULKSTAT_SINGLE. >> >> Perhaps, but we can't arbitrarily decide that those apps will now >> break on >> a new kernel with this change. At minimum we need to audit all of the >> code >> we have that uses bulkstat for such breakage (including DMF!) before >> we make a >> change like this. > > I've looked through everything we have in xfs-cmds and nothing relies on > this bug being present. Vlad helped me with the DMF side - DMF does not > use the XFS_IOC_FSBULKSTAT ioctl, it has it's own interface into the kernel > which calls xfs_bulkstat() directly so it wont be affected by this change. > >> >>>>> - checks against 'ubleft' (the space left in the user's buffer) >>>>> should be against >>>>> 'statstruct_size' which is the supplied minimum object size. The >>>>> mixture of >>>>> checks against statstruct_size and 0 was one of the reasons we >>>>> were skipping >>>>> inodes. >>>> Can you wrap these checks in a static inline function so that it is >>>> obvious >>>> what the correct way to check is and we don't reintroduce this >>>> porblem? i.e. >>>> >>>> static inline int >>>> xfs_bulkstat_ubuffer_large_enough(ssize_t space) >>>> { >>>> return (space > sizeof(struct blah)); >>>> } >>>> >>>> That will also remove a stack variable.... >>> That won't work - statstruct_size is passed into xfs_bulkstat() so we >>> don't >>> know what 'blah' is. Maybe a macro would be easier. >>> >>> #define XFS_BULKSTAT_UBLEFT (ubleft >= statstruct_size) >> >> Yeah, something like that, but I don't like macros with no parameters >> used >> like that.... >> >>>> FWIW - missing from this set of patches - cpu_relax() in the loops. >>>> In the case >>>> where no I/O is required to do the scan, we can hold the cpu for a >>>> long time >>>> and that will hold off I/O completion, etc for the cpu bulkstat is >>>> running on. >>>> Hence after every cluster we scan we should cpu_relax() to allow other >>>> processes cpu time on that cpu. >>>> >>> I don't get how cpu_relax() works. I see that it is called at times >>> with a >>> spinlock held so it wont trigger a context switch. Does it give >>> interrupts a chance to run? >> >> Sorry, my mistake - confused cpu_relax() with cond_resched(). take the >> above >> paragraph and s/cpu_relax/cond_resched/g >> >>> It appears to be used where a minor delay is needed - I don't think >>> we have any >>> cases in xfs_bulkstat() where we need to wait for an event that isn't >>> I/O. >> >> The issue is when we're hitting cached buffers and we never end up >> waiting >> for I/O - we will then monopolise the cpu we are running on and hold off >> all other processing. It's antisocial and leads to high latencies for >> other >> code. >> >> Cheers, >> >> Dave. From owner-xfs@oss.sgi.com Thu Nov 15 21:39:17 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 15 Nov 2007 21:39:21 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.2 required=5.0 tests=BAYES_40,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.0-r574664 Received: from ipmail02.adl2.internode.on.net (ipmail02.adl2.internode.on.net [203.16.214.141]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAG5dFJS011510 for ; Thu, 15 Nov 2007 21:39:17 -0800 Received: from ppp121-44-254-28.lns4.mel4.internode.on.net (HELO jdc.jasonjgw.net) ([121.44.254.28]) by ipmail02.adl2.internode.on.net with ESMTP; 16 Nov 2007 15:51:39 +1030 Received: by jdc.jasonjgw.net (Postfix, from userid 1000) id 56D8210000126; Fri, 16 Nov 2007 16:21:29 +1100 (EST) Date: Fri, 16 Nov 2007 16:21:29 +1100 From: Jason White To: Linux-xfs Subject: SeLinux "fixfiles relabel" shuts down XFS Message-ID: <20071116052129.GA4401@jdc.jasonjgw.net> Mail-Followup-To: Linux-xfs MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.17 (2007-11-01) X-Virus-Scanned: ClamAV 0.91.2/4808/Thu Nov 15 16:53:47 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13682 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jason@jasonjgw.net Precedence: bulk X-list: xfs This isn't intended as a bug report, more an FYI, as I don't have time just now to reproduce the problem, and unfortuantely I wasn't able to obtain any error log. I ran fixfiles relabel on an XFS partition, kernel 2.6.22 (Debian). The fixfiles script terminated with an i/o error and I couldn't access the file system thereafter - I assume there was an XFS shutdown. After the inevitable reboot, xfs_check showed no file system errors, and there have been no subsequent i/o errors indicating possible hardware problems. This is on a 146gb SAS volume in my home workstation, comprising two 73gb drives connected to an LSI SAS controller. Everything has been otherwise stable and the hardware is almost new. This was the beginning of an attempt to set up SELinux on the machine, which I won't have time to try again for a while. This isn't good enough as a bug report, so I'll leave it for now, with the promise that at some point (not soon, regrettably) I'll try the operation again and take more care to capture any console messages. If anyone else happens to reproduce this before I get around to trying it, I'll be interested in following the progress of the bug. From owner-xfs@oss.sgi.com Thu Nov 15 22:04:00 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 15 Nov 2007 22:04:03 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAG63uhI014653 for ; Thu, 15 Nov 2007 22:03:59 -0800 Received: from [134.14.55.89] (soarer.melbourne.sgi.com [134.14.55.89]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA04113; Fri, 16 Nov 2007 17:03:59 +1100 Message-ID: <473D32D9.2020500@sgi.com> Date: Fri, 16 Nov 2007 17:04:09 +1100 From: Vlad Apostolov User-Agent: Thunderbird 2.0.0.6 (X11/20070728) MIME-Version: 1.0 To: Barry Naujok CC: "xfs@oss.sgi.com" , xfs-dev Subject: Re: REVIEW: xfs_reno #2 References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4808/Thu Nov 15 16:53:47 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13683 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: vapo@sgi.com Precedence: bulk X-list: xfs Barry Naujok wrote: > A couple changes from the first xfs_reno: > > - Major one is that symlinks are now supported, but only > owner, group and extended attributes are copied for them > (not times or inode attributes). > > - Man page! > > > To make this better, ideally we need some form of > "swap inodes" function in the kernel, where the entire > contents of the inode themselves are swapped. This form > can handle any inode and without any of the dir/file/attr/etc > copy/swap mechanisms we have in xfs_reno. > > Barry. Hi Barry, The code is looking good. Some questions and minor remarks bellow. - init_nodehash() return value is not used - Why poll_output is volatile? - I think you meant "exit()" instead of "goto quit" below as "recover_fd" is not opened yet: if (n_opt) goto quit; ... quit: free_nodehash(); close(recover_fd); - Is dirname(xxx) used as intended? I think it should be xxx = dirname(xxx). - Some log_message() strings don't have _("text") convention. I see that you take care to copy the DMAPI fields as well. Unfortunately changing the inode number in a DMAPI filesystem would make the DMAPI handle different, which means any application using DMAPI would not be able to access the new file anymore. When the XFS parent pointers feature is released we would need to find out to update the EA to point to the new inode parent directory. This may not be that easy though. Regards, Vlad From owner-xfs@oss.sgi.com Thu Nov 15 22:16:57 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 15 Nov 2007 22:17:02 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_21, J_CHICKENPOX_34,J_CHICKENPOX_43,J_CHICKENPOX_44 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAG6Gnip016490 for ; Thu, 15 Nov 2007 22:16:54 -0800 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA04395; Fri, 16 Nov 2007 17:16:54 +1100 Message-ID: <473D3577.5080409@sgi.com> Date: Fri, 16 Nov 2007 17:15:19 +1100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.9 (X11/20071031) MIME-Version: 1.0 To: xfs-masters@oss.sgi.com CC: Eric Sandeen , xfs@oss.sgi.com, kernel-janitors@vger.kernel.org Subject: Re: [xfs-masters] [PATCH] remove BPCSHIFT and NB?P? macros - was fs/xfs: remove duplicated defines References: <473796EF.6050104@sandeen.net> <473799A9.9000200@sgi.com> <47379E42.2030006@sgi.com> <47379F5A.20107@sgi.com> <20071112020445.GY995458@sgi.com> <473937AC.2080901@sgi.com> <20071113213013.GZ995458@sgi.com> <473A619C.8040206@sgi.com> <20071114054100.GJ995458@sgi.com> <473B98AC.2000600@sgi.com> <20071115062728.GH66820511@sgi.com> <473D2BFB.6040702@sgi.com> In-Reply-To: <473D2BFB.6040702@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4808/Thu Nov 15 16:53:47 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13684 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs Looks good Tim. Nice work. Timothy Shimmin wrote: > David Chinner wrote: >>> While here, looking at a few others... >>> #define NBPP PAGE_SIZE >>> #define NDPP (1 << (PAGE_SHIFT - 9)) <--- not used - another to nuke >>> #define NBPC PAGE_SIZE <----- used once >>> >>> grep -Ir 'NBPC' . | egrep -v 'tag|anot|diff' >>> ./linux-2.6/xfs_linux.h:#define NBPC PAGE_SIZE /* Number of bytes per click */ >>> ./xfs_itable.c: irbuf = kmem_zalloc_greedy(&irbsize, NBPC, NBPC * 4, >>> >>> > grep -Ir 'NBPP' . | egrep -v 'tag|anot|diff|NBPPR' >>> ./linux-2.6/xfs_linux.h:#define NBPP PAGE_SIZE >>> ./quota/xfs_qm.h:#define XFS_QM_HASHSIZE_LOW (NBPP / sizeof(xfs_dqhash_t)) >>> ./quota/xfs_qm.h:#define XFS_QM_HASHSIZE_HIGH ((NBPP * 4) / sizeof(xfs_dqhash_t)) >> PAGE_SIZE >> >>> ./xfs_bmap.c: } else if (mp->m_sb.sb_blocksize >= NBPP) { >>> ./xfs_bmap.c: args.prod = NBPP >> mp->m_sb.sb_blocklog; >> PAGE_CACHE_SIZE >> >>> ./xfs_itable.c: bcount = MIN(left, (int)(NBPP / sizeof(*buffer))); >> PAGE_SIZE >> >>> ./xfs_log.c: kmem_free(tic, NBPP); >>> ./xfs_log.c: uint i = (NBPP / sizeof(xlog_ticket_t)) - 2; >>> ./xfs_log.c: buf = (xfs_caddr_t) kmem_zalloc(NBPP, KM_SLEEP); >> We should replace that hand rolled ticket allocator with a slab cache. >> > In another patch though. > >>> ./xfs_vnodeops.c: rounding = max_t(uint, 1 << mp->m_sb.sb_blocklog, NBPP); >> PAGE_CACHE_SIZE. (as suggested ;) >> >>> Might as well get rid of NBPC and replace by NBPP. >>> >>> Is it just worth s/NBPC/PAGE_SIZE/g ? >> yup. >> > Thanks. > > Patch: > ------------- > > Remove the BPCSHIFT and NB* based macros from XFS. > > The BPCSHIFT based macros, btoc*, ctob*, offtoc* and ctooff > are either not used or don't need to be used. > The NDPP, NDPP, NBBY macros don't need to be used but instead > are replaced directly by PAGE_SIZE and PAGE_CACHE_SIZE > where appropriate. > Initial patch and motivation from Nicolas Kaiser. > > Signed-Off-By: Tim Shimmin > --- > b/fs/xfs/linux-2.6/xfs_linux.h | 28 +--------------------------- > b/fs/xfs/linux-2.6/xfs_lrw.c | 9 ++++----- > b/fs/xfs/quota/xfs_qm.h | 4 ++-- > b/fs/xfs/xfs_bmap.c | 4 ++-- > b/fs/xfs/xfs_itable.c | 4 ++-- > b/fs/xfs/xfs_log.c | 6 +++--- > b/fs/xfs/xfs_vnodeops.c | 9 +++------ > 7 files changed, 17 insertions(+), 47 deletions(-) > > =========================================================================== > Index: fs/xfs/linux-2.6/xfs_linux.h > =========================================================================== > > --- a/fs/xfs/linux-2.6/xfs_linux.h 2007-11-16 16:22:40.000000000 +1100 > +++ b/fs/xfs/linux-2.6/xfs_linux.h 2007-11-16 15:59:22.299466117 +1100 > @@ -143,43 +143,17 @@ > > #define spinlock_destroy(lock) > > -#define NBPP PAGE_SIZE > -#define NDPP (1 << (PAGE_SHIFT - 9)) > - > #define NBBY 8 /* number of bits per byte */ > -#define NBPC PAGE_SIZE /* Number of bytes per click */ > -#define BPCSHIFT PAGE_SHIFT /* LOG2(NBPC) if exact */ > > /* > * Size of block device i/o is parameterized here. > * Currently the system supports page-sized i/o. > */ > -#define BLKDEV_IOSHIFT BPCSHIFT > +#define BLKDEV_IOSHIFT PAGE_CACHE_SHIFT > #define BLKDEV_IOSIZE (1< /* number of BB's per block device block */ > #define BLKDEV_BB BTOBB(BLKDEV_IOSIZE) > > -/* bytes to clicks */ > -#define btoc(x) (((__psunsigned_t)(x)+(NBPC-1))>>BPCSHIFT) > -#define btoct(x) ((__psunsigned_t)(x)>>BPCSHIFT) > -#define btoc64(x) (((__uint64_t)(x)+(NBPC-1))>>BPCSHIFT) > -#define btoct64(x) ((__uint64_t)(x)>>BPCSHIFT) > - > -/* off_t bytes to clicks */ > -#define offtoc(x) (((__uint64_t)(x)+(NBPC-1))>>BPCSHIFT) > -#define offtoct(x) ((xfs_off_t)(x)>>BPCSHIFT) > - > -/* clicks to off_t bytes */ > -#define ctooff(x) ((xfs_off_t)(x)< - > -/* clicks to bytes */ > -#define ctob(x) ((__psunsigned_t)(x)< -#define btoct(x) ((__psunsigned_t)(x)>>BPCSHIFT) > -#define ctob64(x) ((__uint64_t)(x)< - > -/* bytes to clicks */ > -#define btoc(x) (((__psunsigned_t)(x)+(NBPC-1))>>BPCSHIFT) > - > #define ENOATTR ENODATA /* Attribute not found */ > #define EWRONGFS EINVAL /* Mount with wrong filesystem type */ > #define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */ > > =========================================================================== > Index: fs/xfs/linux-2.6/xfs_lrw.c > =========================================================================== > > --- a/fs/xfs/linux-2.6/xfs_lrw.c 2007-11-16 16:22:40.000000000 +1100 > +++ b/fs/xfs/linux-2.6/xfs_lrw.c 2007-11-15 11:32:12.451249159 +1100 > @@ -254,9 +254,8 @@ xfs_read( > > if (unlikely(ioflags & IO_ISDIRECT)) { > if (VN_CACHED(vp)) > - ret = xfs_flushinval_pages(ip, > - ctooff(offtoct(*offset)), > - -1, FI_REMAPF_LOCKED); > + ret = xfs_flushinval_pages(ip, (*offset & PAGE_CACHE_MASK), > + -1, FI_REMAPF_LOCKED); > mutex_unlock(&inode->i_mutex); > if (ret) { > xfs_iunlock(ip, XFS_IOLOCK_SHARED); > @@ -742,9 +741,9 @@ retry: > if (VN_CACHED(vp)) { > WARN_ON(need_i_mutex == 0); > xfs_inval_cached_trace(xip, pos, -1, > - ctooff(offtoct(pos)), -1); > + (pos & PAGE_CACHE_MASK), -1); > error = xfs_flushinval_pages(xip, > - ctooff(offtoct(pos)), > + (pos & PAGE_CACHE_MASK), > -1, FI_REMAPF_LOCKED); > if (error) > goto out_unlock_internal; > > =========================================================================== > Index: fs/xfs/quota/xfs_qm.h > =========================================================================== > > --- a/fs/xfs/quota/xfs_qm.h 2007-11-16 16:22:40.000000000 +1100 > +++ b/fs/xfs/quota/xfs_qm.h 2007-11-16 15:39:53.373375739 +1100 > @@ -52,8 +52,8 @@ extern kmem_zone_t *qm_dqtrxzone; > /* > * Dquot hashtable constants/threshold values. > */ > -#define XFS_QM_HASHSIZE_LOW (NBPP / sizeof(xfs_dqhash_t)) > -#define XFS_QM_HASHSIZE_HIGH ((NBPP * 4) / sizeof(xfs_dqhash_t)) > +#define XFS_QM_HASHSIZE_LOW (PAGE_SIZE / sizeof(xfs_dqhash_t)) > +#define XFS_QM_HASHSIZE_HIGH ((PAGE_SIZE * 4) / sizeof(xfs_dqhash_t)) > > /* > * This defines the unit of allocation of dquots. > > =========================================================================== > Index: fs/xfs/xfs_bmap.c > =========================================================================== > > --- a/fs/xfs/xfs_bmap.c 2007-11-16 16:22:41.000000000 +1100 > +++ b/fs/xfs/xfs_bmap.c 2007-11-16 15:40:27.017050666 +1100 > @@ -2830,11 +2830,11 @@ xfs_bmap_btalloc( > args.prod = align; > if ((args.mod = (xfs_extlen_t)do_mod(ap->off, args.prod))) > args.mod = (xfs_extlen_t)(args.prod - args.mod); > - } else if (mp->m_sb.sb_blocksize >= NBPP) { > + } else if (mp->m_sb.sb_blocksize >= PAGE_CACHE_SIZE) { > args.prod = 1; > args.mod = 0; > } else { > - args.prod = NBPP >> mp->m_sb.sb_blocklog; > + args.prod = PAGE_CACHE_SIZE >> mp->m_sb.sb_blocklog; > if ((args.mod = (xfs_extlen_t)(do_mod(ap->off, args.prod)))) > args.mod = (xfs_extlen_t)(args.prod - args.mod); > } > > =========================================================================== > Index: fs/xfs/xfs_itable.c > =========================================================================== > > --- a/fs/xfs/xfs_itable.c 2007-11-16 16:22:40.000000000 +1100 > +++ b/fs/xfs/xfs_itable.c 2007-11-16 16:05:51.381600461 +1100 > @@ -393,7 +393,7 @@ xfs_bulkstat( > (XFS_INODE_CLUSTER_SIZE(mp) >> mp->m_sb.sb_inodelog); > nimask = ~(nicluster - 1); > nbcluster = nicluster >> mp->m_sb.sb_inopblog; > - irbuf = kmem_zalloc_greedy(&irbsize, NBPC, NBPC * 4, > + irbuf = kmem_zalloc_greedy(&irbsize, PAGE_SIZE, PAGE_SIZE * 4, > KM_SLEEP | KM_MAYFAIL | KM_LARGE); > nirbuf = irbsize / sizeof(*irbuf); > > @@ -815,7 +815,7 @@ xfs_inumbers( > agino = XFS_INO_TO_AGINO(mp, ino); > left = *count; > *count = 0; > - bcount = MIN(left, (int)(NBPP / sizeof(*buffer))); > + bcount = MIN(left, (int)(PAGE_SIZE / sizeof(*buffer))); > buffer = kmem_alloc(bcount * sizeof(*buffer), KM_SLEEP); > error = bufidx = 0; > cur = NULL; > > =========================================================================== > Index: fs/xfs/xfs_log.c > =========================================================================== > > --- a/fs/xfs/xfs_log.c 2007-11-16 16:22:40.000000000 +1100 > +++ b/fs/xfs/xfs_log.c 2007-11-16 15:56:42.615917720 +1100 > @@ -1552,7 +1552,7 @@ xlog_dealloc_log(xlog_t *log) > tic = log->l_unmount_free; > while (tic) { > next_tic = tic->t_next; > - kmem_free(tic, NBPP); > + kmem_free(tic, PAGE_SIZE); > tic = next_tic; > } > } > @@ -3161,13 +3161,13 @@ xlog_state_ticket_alloc(xlog_t *log) > xlog_ticket_t *t_list; > xlog_ticket_t *next; > xfs_caddr_t buf; > - uint i = (NBPP / sizeof(xlog_ticket_t)) - 2; > + uint i = (PAGE_SIZE / sizeof(xlog_ticket_t)) - 2; > > /* > * The kmem_zalloc may sleep, so we shouldn't be holding the > * global lock. XXXmiken: may want to use zone allocator. > */ > - buf = (xfs_caddr_t) kmem_zalloc(NBPP, KM_SLEEP); > + buf = (xfs_caddr_t) kmem_zalloc(PAGE_SIZE, KM_SLEEP); > > spin_lock(&log->l_icloglock); > > > =========================================================================== > Index: fs/xfs/xfs_vnodeops.c > =========================================================================== > > --- a/fs/xfs/xfs_vnodeops.c 2007-11-16 16:22:41.000000000 +1100 > +++ b/fs/xfs/xfs_vnodeops.c 2007-11-16 15:57:01.449505791 +1100 > @@ -4164,15 +4164,12 @@ xfs_free_file_space( > vn_iowait(ip); /* wait for the completion of any pending DIOs */ > } > > - rounding = max_t(uint, 1 << mp->m_sb.sb_blocklog, NBPP); > + rounding = max_t(uint, 1 << mp->m_sb.sb_blocklog, PAGE_CACHE_SIZE); > ioffset = offset & ~(rounding - 1); > > if (VN_CACHED(vp) != 0) { > - xfs_inval_cached_trace(ip, ioffset, -1, > - ctooff(offtoct(ioffset)), -1); > - error = xfs_flushinval_pages(ip, > - ctooff(offtoct(ioffset)), > - -1, FI_REMAPF_LOCKED); > + xfs_inval_cached_trace(ip, ioffset, -1, ioffset, -1); > + error = xfs_flushinval_pages(ip, ioffset, -1, FI_REMAPF_LOCKED); > if (error) > goto out_unlock_iolock; > } > > > > From owner-xfs@oss.sgi.com Thu Nov 15 22:19:26 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 15 Nov 2007 22:19:29 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAG6JMdU017197 for ; Thu, 15 Nov 2007 22:19:24 -0800 Received: from timothy-shimmins-power-mac-g5.local (boing.melbourne.sgi.com [134.14.55.141]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA04544; Fri, 16 Nov 2007 17:19:24 +1100 Message-ID: <473D36AC.7040507@sgi.com> Date: Fri, 16 Nov 2007 17:20:28 +1100 From: Timothy Shimmin User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Vlad Apostolov CC: Barry Naujok , "xfs@oss.sgi.com" , xfs-dev Subject: Re: REVIEW: xfs_reno #2 References: <473D32D9.2020500@sgi.com> In-Reply-To: <473D32D9.2020500@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4808/Thu Nov 15 16:53:47 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13685 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: tes@sgi.com Precedence: bulk X-list: xfs Vlad Apostolov wrote: > > When the XFS parent pointers feature is released we would need to find > out to update the EA to point to the new inode parent directory. This may > not be that easy though. > Really? Apart from the swapping of extents, reno uses standard calls doesn't it, in which case any movement of inodes will have the parent pointers updated by the normal vnode ops (e.g. mkdir, rename) in the kernel. --Tim From owner-xfs@oss.sgi.com Thu Nov 15 23:49:00 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 15 Nov 2007 23:49:03 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_52 autolearn=no version=3.3.0-r574664 Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAG7mtJt027772 for ; Thu, 15 Nov 2007 23:49:00 -0800 Received: from hch by pentafluge.infradead.org with local (Exim 4.63 #1 (Red Hat Linux)) id 1IsvXl-0004by-Mj; Fri, 16 Nov 2007 07:23:13 +0000 Date: Fri, 16 Nov 2007 07:23:13 +0000 From: Christoph Hellwig To: David Chinner Cc: Lachlan McIlroy , xfs-oss , xfs-dev Subject: Re: [PATCH, RFC] Move AIL pushing into a separate thread Message-ID: <20071116072313.GA17009@infradead.org> References: <20071105050706.GW66820511@sgi.com> <473BBDC1.2020107@sgi.com> <20071116004310.GL66820511@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071116004310.GL66820511@sgi.com> User-Agent: Mutt/1.4.2.3i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-Virus-Scanned: ClamAV 0.91.2/4808/Thu Nov 15 16:53:47 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13686 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs On Fri, Nov 16, 2007 at 11:43:10AM +1100, David Chinner wrote: > > >+ /* Too many items we can't do anything with? */ > > >+ if (stuck > 100) > > 100? Arbitrary magic number or was there reason for this? > > Arbitrary magic number based on observation. basically, if > we are skipping too many items because we can't flush them or > they are already being flushed we back off and given them time > to complete whatever operation is being done. i.e. remove pressure > from the AIL while we can't make progress so traversals don't > slow down further inserts and remov?ls to/from the AIL. Might be worth adding something like this as a comment. From owner-xfs@oss.sgi.com Fri Nov 16 01:17:33 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 16 Nov 2007 01:17:38 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from mail.g-house.de (ns2.g-housing.de [81.169.133.75]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAG9HVrG016988 for ; Fri, 16 Nov 2007 01:17:33 -0800 Received: from [89.59.1.239] (helo=sheep.housecafe.de) by mail.g-house.de with esmtpsa (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.63) (envelope-from ) id 1IsxMt-00025b-O7; Fri, 16 Nov 2007 10:20:08 +0100 Received: from localhost ([127.0.0.1] helo=housecafe.dyndns.org) by sheep.housecafe.de with esmtp (Exim 4.68) (envelope-from ) id 1IsxK9-00046G-L4; Fri, 16 Nov 2007 10:17:17 +0100 Received: from 62.180.231.196 (SquirrelMail authenticated user evil) by housecafe.dyndns.org with HTTP; Fri, 16 Nov 2007 10:17:17 +0100 (CET) Message-ID: <59468.62.180.231.196.1195204637.squirrel@housecafe.dyndns.org> In-Reply-To: <20071116003410.GA16797@puku.stupidest.org> References: <20071114070400.GA25708@puku.stupidest.org> <473AA72C.6020308@panasas.com> <20071114125907.GB4010@fieldses.org> <20071116003410.GA16797@puku.stupidest.org> Date: Fri, 16 Nov 2007 10:17:17 +0100 (CET) Subject: Re: 2.6.24-rc2 XFS nfsd hang From: "Christian Kujau" To: "Chris Wedgwood" Cc: "LKML" , "J. Bruce Fields" , "Benny Halevy" , linux-xfs@oss.sgi.com User-Agent: SquirrelMail/1.5.2 [SVN] MIME-Version: 1.0 Content-Type: text/plain;charset=ISO-8859-15 Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV 0.91.2/4809/Thu Nov 15 23:22:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13687 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lists@nerdbynature.de Precedence: bulk X-list: xfs On Fri, November 16, 2007 01:34, Chris Wedgwood wrote: > I'm not sure what you're doing here, but a viable work-around for now > might be to use nfsv2 mounts, something like > > mount -o vers=2 ... > or to keep v3 and disable readdirplus doing something like: > mount -o vers=3,nordirplus ... OK, I'll try this. I hope this can be fixed somehow before 2.6.24... Thank you for your time, Christian. -- BOFH excuse #442: Trojan horse ran out of hay From owner-xfs@oss.sgi.com Fri Nov 16 03:03:12 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 16 Nov 2007 03:03:18 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from smtp118.sbc.mail.sp1.yahoo.com (smtp118.sbc.mail.sp1.yahoo.com [69.147.64.91]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAGB3B0g032644 for ; Fri, 16 Nov 2007 03:03:12 -0800 Received: (qmail 29069 invoked from network); 16 Nov 2007 11:03:17 -0000 Received: from unknown (HELO stupidest.org) (cwedgwood@sbcglobal.net@75.36.198.62 with login) by smtp118.sbc.mail.sp1.yahoo.com with SMTP; 16 Nov 2007 11:03:17 -0000 X-YMail-OSG: U119a4MVM1mJuVtDo642ArtWca3edY5hfWHGQExThg4plGKMv8vgKS5dGDSgwzgCcR74l6EUlQ-- Received: by tuatara.stupidest.org (Postfix, from userid 10000) id 7D2F32810B8A; Fri, 16 Nov 2007 03:03:15 -0800 (PST) Date: Fri, 16 Nov 2007 03:03:15 -0800 From: Chris Wedgwood To: Christian Kujau Cc: LKML , "J. Bruce Fields" , Benny Halevy , linux-xfs@oss.sgi.com Subject: Re: 2.6.24-rc2 XFS nfsd hang Message-ID: <20071116110315.GA27969@puku.stupidest.org> References: <20071114070400.GA25708@puku.stupidest.org> <473AA72C.6020308@panasas.com> <20071114125907.GB4010@fieldses.org> <20071116003410.GA16797@puku.stupidest.org> <59468.62.180.231.196.1195204637.squirrel@housecafe.dyndns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <59468.62.180.231.196.1195204637.squirrel@housecafe.dyndns.org> X-Virus-Scanned: ClamAV 0.91.2/4810/Fri Nov 16 00:38:11 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13688 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: cw@f00f.org Precedence: bulk X-list: xfs On Fri, Nov 16, 2007 at 10:17:17AM +0100, Christian Kujau wrote: > OK, I'll try this. I hope this can be fixed somehow before 2.6.24... Well, one simple nasty idea would be something like: diff --git a/fs/Kconfig b/fs/Kconfig index 429a002..da231fd 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -1604,7 +1604,7 @@ config NFS_FS config NFS_V3 bool "Provide NFSv3 client support" - depends on NFS_FS + depends on NFS_FS && !XFS help Say Y here if you want your NFS client to be able to speak version 3 of the NFS protocol. So people who are likely to be affect just side-step the issue until it's resolved. From owner-xfs@oss.sgi.com Fri Nov 16 06:56:11 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 16 Nov 2007 06:56:15 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from pat.uio.no (pat.uio.no [129.240.10.15]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAGEu8dL008078 for ; Fri, 16 Nov 2007 06:56:10 -0800 Received: from mail-mx2.uio.no ([129.240.10.30]) by pat.uio.no with esmtp (Exim 4.67) (envelope-from ) id 1It22g-0006yY-OD; Fri, 16 Nov 2007 15:19:34 +0100 Received: from smtp.uio.no ([129.240.10.9] helo=mail-mx2.uio.no) by mail-mx2.uio.no with esmtp (Exim 4.67) (envelope-from ) id 1It22g-0005eH-8L; Fri, 16 Nov 2007 15:19:34 +0100 Received: from c-69-242-210-120.hsd1.mi.comcast.net ([69.242.210.120] helo=[192.168.0.101]) by mail-mx2.uio.no with esmtpsa (SSLv3:AES256-SHA:256) (Exim 4.67) (envelope-from ) id 1It22f-0005dd-Nd; Fri, 16 Nov 2007 15:19:34 +0100 Subject: Re: 2.6.24-rc2 XFS nfsd hang From: Trond Myklebust To: Chris Wedgwood Cc: Christian Kujau , LKML , "J. Bruce Fields" , Benny Halevy , linux-xfs@oss.sgi.com In-Reply-To: <20071116110315.GA27969@puku.stupidest.org> References: <20071114070400.GA25708@puku.stupidest.org> <473AA72C.6020308@panasas.com> <20071114125907.GB4010@fieldses.org> <20071116003410.GA16797@puku.stupidest.org> <59468.62.180.231.196.1195204637.squirrel@housecafe.dyndns.org> <20071116110315.GA27969@puku.stupidest.org> Content-Type: text/plain Date: Fri, 16 Nov 2007 09:19:32 -0500 Message-Id: <1195222772.7653.16.camel@heimdal.trondhjem.org> Mime-Version: 1.0 X-Mailer: Evolution 2.12.1 Content-Transfer-Encoding: 7bit X-UiO-Resend: resent X-UiO-ClamAV-Virus: No X-UiO-Spam-info: not spam, SpamAssassin (score=-0.1, required=12.0, autolearn=disabled, AWL=-0.087) X-UiO-Scanned: 2AB1C2656C633C10A506D707CAF5A06631341EC6 X-UiO-SPAM-Test: remote_host: 129.240.10.9 spam_score: 0 maxlevel 200 minaction 2 bait 0 mail/h: 626 total 5181029 max/h 8345 blacklist 0 greylist 0 ratelimit 0 X-Virus-Scanned: ClamAV 0.91.2/4810/Fri Nov 16 00:38:11 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13689 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: trond.myklebust@fys.uio.no Precedence: bulk X-list: xfs On Fri, 2007-11-16 at 03:03 -0800, Chris Wedgwood wrote: > On Fri, Nov 16, 2007 at 10:17:17AM +0100, Christian Kujau wrote: > > > OK, I'll try this. I hope this can be fixed somehow before 2.6.24... > > Well, one simple nasty idea would be something like: > > diff --git a/fs/Kconfig b/fs/Kconfig > index 429a002..da231fd 100644 > --- a/fs/Kconfig > +++ b/fs/Kconfig > @@ -1604,7 +1604,7 @@ config NFS_FS > > config NFS_V3 > bool "Provide NFSv3 client support" > - depends on NFS_FS > + depends on NFS_FS && !XFS > help > Say Y here if you want your NFS client to be able to speak version > 3 of the NFS protocol. > > So people who are likely to be affect just side-step the issue until > it's resolved. Very funny, but disabling XFS on the client won't help. Trond From owner-xfs@oss.sgi.com Fri Nov 16 13:43:23 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 16 Nov 2007 13:43:29 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from smtp117.sbc.mail.sp1.yahoo.com (smtp117.sbc.mail.sp1.yahoo.com [69.147.64.90]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAGLhLIP018459 for ; Fri, 16 Nov 2007 13:43:22 -0800 Received: (qmail 99869 invoked from network); 16 Nov 2007 21:43:28 -0000 Received: from unknown (HELO stupidest.org) (cwedgwood@sbcglobal.net@24.5.75.45 with login) by smtp117.sbc.mail.sp1.yahoo.com with SMTP; 16 Nov 2007 21:43:28 -0000 X-YMail-OSG: 3oDnVRoVM1ktgIC6r4WPgaJ4t6NBY7SaxXbqZOGW4HIaZJylbBf2Le4U3WIUlywvTuhF7l8yTQ-- Received: by tuatara.stupidest.org (Postfix, from userid 10000) id 275B3280812D; Fri, 16 Nov 2007 13:43:27 -0800 (PST) Date: Fri, 16 Nov 2007 13:43:27 -0800 From: Chris Wedgwood To: Trond Myklebust Cc: Christian Kujau , LKML , "J. Bruce Fields" , Benny Halevy , linux-xfs@oss.sgi.com Subject: Re: 2.6.24-rc2 XFS nfsd hang Message-ID: <20071116214327.GA9685@puku.stupidest.org> References: <20071114070400.GA25708@puku.stupidest.org> <473AA72C.6020308@panasas.com> <20071114125907.GB4010@fieldses.org> <20071116003410.GA16797@puku.stupidest.org> <59468.62.180.231.196.1195204637.squirrel@housecafe.dyndns.org> <20071116110315.GA27969@puku.stupidest.org> <1195222772.7653.16.camel@heimdal.trondhjem.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1195222772.7653.16.camel@heimdal.trondhjem.org> X-Virus-Scanned: ClamAV 0.91.2/4817/Fri Nov 16 12:06:29 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13690 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: cw@f00f.org Precedence: bulk X-list: xfs On Fri, Nov 16, 2007 at 09:19:32AM -0500, Trond Myklebust wrote: > Very funny, but disabling XFS on the client won't help. Oops, I meant it for NFSD... and I'm somewhat serious. I'm not saying it's a good long term solution, but a potentially safer short-term workaround. From owner-xfs@oss.sgi.com Fri Nov 16 21:13:15 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 16 Nov 2007 21:13:26 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_45, SPF_HELO_FAIL autolearn=no version=3.3.0-r574664 Received: from mxmail.synplicity.com (synvpn.synplicity.com [209.157.48.1]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAH5DCsi006562 for ; Fri, 16 Nov 2007 21:13:14 -0800 X-IronPort-AV: E=Sophos;i="4.21,429,1188802800"; d="scan'208";a="683939" Received: from mailhost.synplicity.com (HELO synplcty.synplicity.com) ([209.24.66.180]) by mxmail.synplicity.com with ESMTP; 16 Nov 2007 21:13:18 -0800 Received: from [63.110.200.48] (localhost [127.0.0.1]) by synplcty.synplicity.com (8.13.1/8.12.11) with ESMTP id lAH5DG8M024568; Fri, 16 Nov 2007 21:13:17 -0800 (PST) Message-ID: <473E7870.7070901@synplicity.com> Date: Fri, 16 Nov 2007 21:13:20 -0800 From: Chris Eddington User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) MIME-Version: 1.0 To: David Chinner CC: "xfs@oss.sgi.com" Subject: Re: xfs_repair - what's the damage? References: <4739F2CD.2020800@synplicity.com> <20071113210853.GY995458@sgi.com> In-Reply-To: <20071113210853.GY995458@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4822/Fri Nov 16 17:26:56 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13691 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: chrise@synplicity.com Precedence: bulk X-list: xfs Thanks David. I had a 2-port SATA failure on a RAID5 array. I've got all disks up and running again , but in a RAID5 degraded state (3 out of 4 disks). The logs tell me one port failed and then the other 8 hours later and the raid system shut down immediately. So I'm very surprised so much data is lost, as the machine was pretty idle when the failures occured. I'm thinking that maybe the array got assembled incorrectly somehow. Thanks, Chris David Chinner wrote: > On Tue, Nov 13, 2007 at 10:54:05AM -0800, Chris Eddington wrote: > >> Hi, >> >> Can someone point me to instructions on how to understand the scope of >> damage to this filesystem based on the output from xfs_repair below? >> What is it repairing, and what data is lost? I'm not sure how to interpret >> these messages or where to go to find out. >> > > Looks like you had something write crap over various parts of > the filesystem. Both AG 2 and ag 24 have header problems, and > then there's a bunch of freespace and allocated inode problems > because the indexes were lost due ot the header corruption. > > Who knows how much else is broken - it depends on how much > bad data got written into the filesystem. best you can do is > to run xfs_repair and sift through the debris in lost+found > and try to work out what the lost data is... > > As I always ask - how did the filesytem get into this state? > > Cheers, > > Dave. > From owner-xfs@oss.sgi.com Sun Nov 18 06:44:56 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 18 Nov 2007 06:45:02 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from mail.g-house.de (ns2.g-housing.de [81.169.133.75]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAIEisUW008772 for ; Sun, 18 Nov 2007 06:44:55 -0800 Received: from [89.49.132.203] (helo=[192.168.178.25]) by mail.g-house.de with esmtpsa (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.63) (envelope-from ) id 1ItlQs-0001m9-Qv; Sun, 18 Nov 2007 15:47:35 +0100 Date: Sun, 18 Nov 2007 15:44:34 +0100 (CET) From: Christian Kujau X-X-Sender: evil@sheep.housecafe.de To: Chris Wedgwood cc: Trond Myklebust , LKML , "J. Bruce Fields" , Benny Halevy , linux-xfs@oss.sgi.com Subject: Re: 2.6.24-rc2 XFS nfsd hang In-Reply-To: <20071116214327.GA9685@puku.stupidest.org> Message-ID: References: <20071114070400.GA25708@puku.stupidest.org> <473AA72C.6020308@panasas.com> <20071114125907.GB4010@fieldses.org> <20071116003410.GA16797@puku.stupidest.org> <59468.62.180.231.196.1195204637.squirrel@housecafe.dyndns.org> <20071116110315.GA27969@puku.stupidest.org> <1195222772.7653.16.camel@heimdal.trondhjem.org> <20071116214327.GA9685@puku.stupidest.org> User-Agent: Alpine 0.99999 (DEB 796 2007-11-08) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: ClamAV 0.91.2/4832/Sat Nov 17 17:07:36 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13692 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lists@nerdbynature.de Precedence: bulk X-list: xfs On Fri, 16 Nov 2007, Chris Wedgwood wrote: > Oops, I meant it for NFSD... and I'm somewhat serious. I'm not > saying it's a good long term solution, but a potentially safer > short-term workaround. I've opened http://bugzilla.kernel.org/show_bug.cgi?id=9400 to track this one (and to not forget about it :)). I wonder why so few people are seeing this, I'd have assumed that NFSv3 && XFS is not sooo exotic... Christian. -- BOFH excuse #273: The cord jumped over and hit the power switch. From owner-xfs@oss.sgi.com Sun Nov 18 07:31:21 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 18 Nov 2007 07:31:26 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from lucidpixels.com (lucidpixels.com [75.144.35.66]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAIFVJPt014723 for ; Sun, 18 Nov 2007 07:31:20 -0800 Received: by lucidpixels.com (Postfix, from userid 1001) id E2C541C000263; Sun, 18 Nov 2007 10:31:26 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by lucidpixels.com (Postfix) with ESMTP id E19DF4019345; Sun, 18 Nov 2007 10:31:26 -0500 (EST) Date: Sun, 18 Nov 2007 10:31:26 -0500 (EST) From: Justin Piszcz X-X-Sender: jpiszcz@p34.internal.lan To: Christian Kujau cc: Chris Wedgwood , Trond Myklebust , LKML , "J. Bruce Fields" , Benny Halevy , linux-xfs@oss.sgi.com Subject: Re: 2.6.24-rc2 XFS nfsd hang In-Reply-To: Message-ID: References: <20071114070400.GA25708@puku.stupidest.org> <473AA72C.6020308@panasas.com> <20071114125907.GB4010@fieldses.org> <20071116003410.GA16797@puku.stupidest.org> <59468.62.180.231.196.1195204637.squirrel@housecafe.dyndns.org> <20071116110315.GA27969@puku.stupidest.org> <1195222772.7653.16.camel@heimdal.trondhjem.org> <20071116214327.GA9685@puku.stupidest.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: ClamAV 0.91.2/4832/Sat Nov 17 17:07:36 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13693 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jpiszcz@lucidpixels.com Precedence: bulk X-list: xfs On Sun, 18 Nov 2007, Christian Kujau wrote: > On Fri, 16 Nov 2007, Chris Wedgwood wrote: >> Oops, I meant it for NFSD... and I'm somewhat serious. I'm not >> saying it's a good long term solution, but a potentially safer >> short-term workaround. > > I've opened http://bugzilla.kernel.org/show_bug.cgi?id=9400 to track this one > (and to not forget about it :)). > > I wonder why so few people are seeing this, I'd have assumed that > NFSv3 && XFS is not sooo exotic... Still on 2.6.23.x here (also use nfsv3 + xfs). > > Christian. > -- > BOFH excuse #273: > > The cord jumped over and hit the power switch. > > From owner-xfs@oss.sgi.com Sun Nov 18 13:09:05 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 18 Nov 2007 13:09:09 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: *** X-Spam-Status: No, score=3.0 required=5.0 tests=BAYES_95 autolearn=no version=3.3.0-r574664 Received: from agamemnon.mpc.com.br (agamemnon.mpc.com.br [200.184.85.246]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAIL8wdv004482 for ; Sun, 18 Nov 2007 13:09:05 -0800 Received: from localhost (localhost.mpc.com.br [127.0.0.1]) by agamemnon.mpc.com.br (Postfix) with ESMTP id D095975A1C; Sun, 18 Nov 2007 19:00:51 -0200 (BRST) Received: from agamemnon.mpc.com.br ([127.0.0.1]) by localhost (agamemnon.mpc.com.br [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 80638-27; Sun, 18 Nov 2007 19:00:51 -0200 (BRST) Received: from F52F2867C1364CC (unknown [200.211.87.7]) (Authenticated sender: adminseguro@agamemnon.mpc.com.br) by agamemnon.mpc.com.br (Postfix) with ESMTP id D9875759FD; Sun, 18 Nov 2007 19:00:50 -0200 (BRST) Message-ID: <387-220071101820580406@F52F2867C1364CC> X-Priority: 1 To: "INVITATION" From: "inseguro" Subject: INVITATION Date: Sun, 18 Nov 2007 17:58:00 -0300 MIME-Version: 1.0 Content-type: text/plain; charset=windows-1252 X-Virus-Scanned: ClamAV 0.91.2/4832/Sat Nov 17 17:07:36 2007 on oss.sgi.com X-Virus-Scanned: by amavisd-new at mpc.com.br X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id lAIL95dv004509 X-archive-position: 13694 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: adm@inseguro.com.br Precedence: bulk X-list: xfs INVITE YOU TO VISITE MOST IMPORTANTE BRAZILIAN WEB SITE, PORTUGUESE/ SPANINSH/ ENGLISH. WWW.INSEGURO.COM.BR SEND ME A MESSAGE ABOUT . 2.260.000 VIEWS. ADM@INSEGURO.COM.BR TKS, FERNANDO DOMINGOS From owner-xfs@oss.sgi.com Sun Nov 18 14:07:44 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 18 Nov 2007 14:07:52 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from mail.g-house.de (ns2.g-housing.de [81.169.133.75]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAIM7ffZ014784 for ; Sun, 18 Nov 2007 14:07:44 -0800 Received: from [89.49.132.203] (helo=[192.168.178.25]) by mail.g-house.de with esmtpsa (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.63) (envelope-from ) id 1ItsLM-0005C6-M7; Sun, 18 Nov 2007 23:10:21 +0100 Date: Sun, 18 Nov 2007 23:07:19 +0100 (CET) From: Christian Kujau X-X-Sender: evil@sheep.housecafe.de To: Justin Piszcz cc: Chris Wedgwood , Trond Myklebust , LKML , "J. Bruce Fields" , Benny Halevy , linux-xfs@oss.sgi.com Subject: Re: 2.6.24-rc2 XFS nfsd hang In-Reply-To: Message-ID: References: <20071114070400.GA25708@puku.stupidest.org> <473AA72C.6020308@panasas.com> <20071114125907.GB4010@fieldses.org> <20071116003410.GA16797@puku.stupidest.org> <59468.62.180.231.196.1195204637.squirrel@housecafe.dyndns.org> <20071116110315.GA27969@puku.stupidest.org> <1195222772.7653.16.camel@heimdal.trondhjem.org> <20071116214327.GA9685@puku.stupidest.org> User-Agent: Alpine 0.99999 (DEB 796 2007-11-08) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: ClamAV 0.91.2/4832/Sat Nov 17 17:07:36 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13695 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lists@nerdbynature.de Precedence: bulk X-list: xfs On Sun, 18 Nov 2007, Justin Piszcz wrote: >> I wonder why so few people are seeing this, I'd have assumed that >> NFSv3 && XFS is not sooo exotic... > Still on 2.6.23.x here (also use nfsv3 + xfs). So, it's the "too few people are testing -rc kernels" issue again :( Christian. -- BOFH excuse #118: the router thinks its a printer. From owner-xfs@oss.sgi.com Sun Nov 18 15:13:36 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 18 Nov 2007 15:13:41 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAINDVsl032445 for ; Sun, 18 Nov 2007 15:13:35 -0800 Received: from [134.14.55.89] (soarer.melbourne.sgi.com [134.14.55.89]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA15469; Mon, 19 Nov 2007 10:13:34 +1100 Message-ID: <4740C727.4050008@sgi.com> Date: Mon, 19 Nov 2007 10:13:43 +1100 From: Vlad Apostolov User-Agent: Thunderbird 2.0.0.6 (X11/20070728) MIME-Version: 1.0 To: Timothy Shimmin CC: Barry Naujok , "xfs@oss.sgi.com" , xfs-dev Subject: Re: REVIEW: xfs_reno #2 References: <473D32D9.2020500@sgi.com> <473D36AC.7040507@sgi.com> In-Reply-To: <473D36AC.7040507@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4835/Sun Nov 18 14:21:32 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13696 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: vapo@sgi.com Precedence: bulk X-list: xfs Timothy Shimmin wrote: > Vlad Apostolov wrote: >> >> When the XFS parent pointers feature is released we would need to find >> out to update the EA to point to the new inode parent directory. This >> may >> not be that easy though. >> > Really? > Apart from the swapping of extents, reno uses standard calls doesn't it, > in which case any movement of inodes will have the parent pointers > updated by the normal vnode ops (e.g. mkdir, rename) in the kernel. > > --Tim When a 64 bits parent inode directory is changed to 32 bits inode, I couldn't see code that would change parent pointer EA of the children to point to the new 32 bits parent. Please correct me if I missed something. Regards, Vlad From owner-xfs@oss.sgi.com Sun Nov 18 15:19:12 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 18 Nov 2007 15:19:15 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAINJ7Vg001391 for ; Sun, 18 Nov 2007 15:19:11 -0800 Received: from [134.14.55.89] (soarer.melbourne.sgi.com [134.14.55.89]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA15753; Mon, 19 Nov 2007 10:19:10 +1100 Message-ID: <4740C878.7090809@sgi.com> Date: Mon, 19 Nov 2007 10:19:20 +1100 From: Vlad Apostolov User-Agent: Thunderbird 2.0.0.6 (X11/20070728) MIME-Version: 1.0 To: Timothy Shimmin CC: Barry Naujok , "xfs@oss.sgi.com" , xfs-dev Subject: Re: REVIEW: xfs_reno #2 References: <473D32D9.2020500@sgi.com> <473D36AC.7040507@sgi.com> <4740C727.4050008@sgi.com> In-Reply-To: <4740C727.4050008@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4835/Sun Nov 18 14:21:32 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13697 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: vapo@sgi.com Precedence: bulk X-list: xfs Vlad Apostolov wrote: > Timothy Shimmin wrote: >> Vlad Apostolov wrote: >>> >>> When the XFS parent pointers feature is released we would need to find >>> out to update the EA to point to the new inode parent directory. >>> This may >>> not be that easy though. >>> >> Really? >> Apart from the swapping of extents, reno uses standard calls doesn't it, >> in which case any movement of inodes will have the parent pointers >> updated by the normal vnode ops (e.g. mkdir, rename) in the kernel. >> >> --Tim > When a 64 bits parent inode directory is changed to 32 bits inode, > I couldn't see code that would change parent pointer EA of the > children to point to the new 32 bits parent. Please correct me if > I missed something. > > Regards, > Vlad > Or maybe renaming a file under the new parent will update the EA parent pointer to the new parent. From owner-xfs@oss.sgi.com Sun Nov 18 19:02:42 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 18 Nov 2007 19:02:44 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAJ32dDa030610 for ; Sun, 18 Nov 2007 19:02:41 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA19794; Mon, 19 Nov 2007 14:02:45 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAJ32jdD111848864; Mon, 19 Nov 2007 14:02:45 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAJ32iMN111361865; Mon, 19 Nov 2007 14:02:44 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Mon, 19 Nov 2007 14:02:44 +1100 From: David Chinner To: Lachlan McIlroy Cc: David Chinner , xfs-dev , xfs-oss Subject: Re: [PATCH] bulkstat fixups Message-ID: <20071119030244.GR66820511@sgi.com> References: <4733EEF2.9010504@sgi.com> <20071111214759.GS995458@sgi.com> <4737C11D.8030007@sgi.com> <20071112041121.GT66820511@sgi.com> <473D1DE0.1090106@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <473D1DE0.1090106@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4838/Sun Nov 18 17:56:13 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13698 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Fri, Nov 16, 2007 at 03:34:40PM +1100, Lachlan McIlroy wrote: > Updated patch - I added cond_resched() calls into each loop - for loops that > have a 'continue' somewhere in them I added the cond_resched() at the start, > otherwise I put it at the end. You probably don't need the call in the innermost loop (the walking across the inode cluster). > >>>Userspace visile change. What applications do we have that rely on this > >>>behaviour that will be broken by this change? > >>Any apps that rely on the existing behaviour are probably broken. If an > >>app > >>wants to call xfs_bulkstat_single() it should use > >>XFS_IOC_FSBULKSTAT_SINGLE. > > > >Perhaps, but we can't arbitrarily decide that those apps will now break on > >a new kernel with this change. At minimum we need to audit all of the code > >we have that uses bulkstat for such breakage (including DMF!) before we > >make a > >change like this. > > I've looked through everything we have in xfs-cmds and nothing relies on > this bug being present. Vlad helped me with the DMF side - DMF does not > use the XFS_IOC_FSBULKSTAT ioctl, it has it's own interface into the kernel > which calls xfs_bulkstat() directly so it wont be affected by this change. Sounds like it really is a bug as nothing is trying to exploit that behaviour. Ok, seems fair to fix it. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Sun Nov 18 19:50:04 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 18 Nov 2007 19:50:08 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_31, J_CHICKENPOX_63,J_CHICKENPOX_64,J_CHICKENPOX_73 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAJ3o1m2003768 for ; Sun, 18 Nov 2007 19:50:02 -0800 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA20628; Mon, 19 Nov 2007 14:50:03 +1100 Message-ID: <47410796.6090403@sgi.com> Date: Mon, 19 Nov 2007 14:48:38 +1100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.9 (X11/20071031) MIME-Version: 1.0 To: Barry Naujok CC: "xfs@oss.sgi.com" , xfs-dev Subject: Re: REVIEW: xfs_reno #2 References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4838/Sun Nov 18 17:56:13 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13699 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs +#define NH_BUCKETS 65536 +#define NH_HASH(ino) (nodehash + ((ino) % NH_BUCKETS)) Hashing addresses by a power of two often results in an uneven distribution over the hashtable. Have you verified your hashing algorithm? +static nodelist_t * +init_nodehash(void) +{ + int i; + + nodehash = calloc(NH_BUCKETS, sizeof(nodelist_t)); + if (nodehash == NULL) { + err_nomem(); + return NULL; + } + + for (i = 0; i < NH_BUCKETS; i++) { + nodehash[i].nodes = NULL; + nodehash[i].lastnode = 0; + nodehash[i].listlen = 0; + } No need to do this, calloc() zeroed the memory. + + return nodehash; +} +static nlink_t +add_path( + bignode_t *node, + const char *path) +{ + node->paths = realloc(node->paths, + sizeof(char *) * (node->numpaths + 1)); Lots of little allocations here, realloc()'ing for space for one more pointer is inefficient. Can we alloc a chunk of pointers? Just how many path pointers do we typically need? Can we add an array of initial pointers into bignode_t and when we exceed that start allocating more chunks here? + if (node->paths == NULL) { + err_nomem(); + exit(1); + } + + node->paths[node->numpaths] = strdup(path); More little allocations. Can we preallocate a chunk of memory and strcpy() the paths into it? The array of path pointers would then be indexes into the memory. + if (node->paths[node->numpaths] == NULL) { + err_nomem(); + exit(1); + } + + node->numpaths++; + if (node->numpaths > highest_numpaths) + highest_numpaths = node->numpaths; + + return node->numpaths; +} +static bignode_t * +add_node( + nodelist_t *list, + xfs_ino_t ino, + int ftw_flags, + const char *path) +{ + bignode_t *node; + + if (list->lastnode >= list->listlen) { + list->listlen += 500; + list->nodes = realloc(list->nodes, + sizeof(bignode_t) * list->listlen); Can we avoid the realloc()? (realloc() may need to copy the data to a new location if it cannot extend the current allocation.) For example each chunk of 500 nodes could end in a pointer to the next chunk. + if (list->nodes == NULL) { + err_nomem(); + return NULL; + } + } +static bignode_t * +find_node( + xfs_ino_t ino) +{ + int i; + nodelist_t *nodelist; + bignode_t *nodes; + + nodelist = NH_HASH(ino); + nodes = nodelist->nodes; + + for(i = 0; i < nodelist->lastnode; i++) { + if (nodes[i].ino == ino) { By any chance do we read inodes in ascending order? Or can they be in random order? If they are in ascending order then we could binary search here. If not, and we call find_node() a lot, then it might be worth sorting each list of nodes. + return &nodes[i]; + } + } + + return NULL; +} + bignode_t *nodes = nodehash[i].nodes; + for (j = 0; j < nodehash[i].lastnode; j++, nodes++) + dump_node("nodehash", nodes); You have this code in various places. You may be able to save a few cycles by dropping the loop counter. Note I have invented listcount to be the actual number of nodes in the list. bignode_t *nodes = nodehash[i].nodes; bignode_t *lastnode = nodes + nodehash[i].listcount; for (; nodes < lastnode; nodes++) dump_node("nodehash", nodes); +static int +clone_attribs( + char *source, + char *target) +{ + char list_buf[ATTRBUFSIZE]; May not be an issue putting 1k on stack here but could be a global allocated on startup. + char *attr_buf; + int rval; + + attr_buf = malloc(ATTR_MAX_VALUELEN * 2); Could do this allocation on startup too - one less failure case to worry about. + if (attr_buf == NULL) { + err_nomem(); + return -1; + } + rval = attr_clone_copy(source, target, list_buf, attr_buf, + ATTR_MAX_VALUELEN * 2, 0); + if (rval == 0) + rval = attr_clone_copy(source, target, list_buf, attr_buf, + ATTR_MAX_VALUELEN * 2, ATTR_ROOT); + if (rval == 0) + rval = attr_clone_copy(source, target, list_buf, attr_buf, + ATTR_MAX_VALUELEN * 2, ATTR_SECURE); + free(attr_buf); + return rval; +} + SET_PHASE(DIR_PHASE_7); + + /* rename cur_target src */ + rval = rename(cur_target, srcname); + if (rval != 0) { + /* + * we can't abort since the src dir is now gone. + * let the admin clean this one up + */ + err_message(_("unable to rename directory: %s to %s"), + cur_target, srcname); + } + goto quit; + + quit_undo: + if (move_dirents(cur_target, srcname, &move_count) != 0) { + /* oh, dear lord... let the admin clean this one up */ + err_message(_("unable to move directory contents back: %s to %s"), + cur_target, srcname); + goto quit; + } Can we avoid these 'leave it to the admin to clean up' problems? Could we rename the source directory to a temporary name, rename the target to the source name and if all that works, remove the temporary source otherwise remove the target and rename the source back again? Not sure if that actually buys us anything since we're back to square one if we can't rename the temporary source back again. +static void +update_recoverfile(void) +{ + static const char null_file[] = "0\n0\n0\n\ntarget: \ntemp: \nend\n"; + static size_t buf_size = 0; + static char *buf = NULL; + int i, len; + + if (recover_fd <= 0) + return; + + if (cur_node == NULL || cur_phase == 0) { + /* inbetween processing or still scanning */ + lseek(recover_fd, 0, SEEK_SET); + write(recover_fd, null_file, sizeof(null_file)); + return; + } + + ASSERT(highest_numpaths > 0); + if (buf == NULL) { + buf_size = (highest_numpaths + 3) * PATH_MAX; + buf = malloc(buf_size); + if (buf == NULL) { + err_nomem(); + exit(1); + } + } Should you check if highest_numpaths has increased and realloc the buffer? Or will we have finished the scan by the time we get here? + + len = sprintf(buf, "%d\n%llu\n%d\n", cur_phase, + (long long)cur_node->ino, cur_node->ftw_flags); + + for (i = 0; i < cur_node->numpaths; i++) + len += sprintf(buf + len, "%s\n", cur_node->paths[i]); + + len += sprintf(buf + len, "target: %s\ntemp: %s\nend\n", + cur_target, cur_temp); + + ASSERT(len < buf_size); Can we use snprintf() instead? + + lseek(recover_fd, 0, SEEK_SET); + ftruncate(recover_fd, 0); + write(recover_fd, buf, len); +} What's the test plan for xfs_reno? Barry Naujok wrote: > A couple changes from the first xfs_reno: > > - Major one is that symlinks are now supported, but only > owner, group and extended attributes are copied for them > (not times or inode attributes). > > - Man page! > > > To make this better, ideally we need some form of > "swap inodes" function in the kernel, where the entire > contents of the inode themselves are swapped. This form > can handle any inode and without any of the dir/file/attr/etc > copy/swap mechanisms we have in xfs_reno. > > Barry. From owner-xfs@oss.sgi.com Mon Nov 19 05:07:01 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 19 Nov 2007 05:07:04 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAJD7079023651 for ; Mon, 19 Nov 2007 05:07:01 -0800 Received: from hch by pentafluge.infradead.org with local (Exim 4.63 #1 (Red Hat Linux)) id 1Iu5uI-00049e-Pe; Mon, 19 Nov 2007 12:39:18 +0000 Date: Mon, 19 Nov 2007 12:39:18 +0000 From: Christoph Hellwig To: Timothy Shimmin Cc: Vlad Apostolov , Barry Naujok , "xfs@oss.sgi.com" , xfs-dev Subject: Re: REVIEW: xfs_reno #2 Message-ID: <20071119123918.GA15942@infradead.org> References: <473D32D9.2020500@sgi.com> <473D36AC.7040507@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <473D36AC.7040507@sgi.com> User-Agent: Mutt/1.4.2.3i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-Virus-Scanned: ClamAV 0.91.2/4841/Sun Nov 18 20:25:36 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13700 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs On Fri, Nov 16, 2007 at 05:20:28PM +1100, Timothy Shimmin wrote: > Vlad Apostolov wrote: > > > >When the XFS parent pointers feature is released we would need to find > >out to update the EA to point to the new inode parent directory. This may > >not be that easy though. > > > Really? > Apart from the swapping of extents, reno uses standard calls doesn't it, > in which case any movement of inodes will have the parent pointers > updated by the normal vnode ops (e.g. mkdir, rename) in the kernel. What parent pointers? From owner-xfs@oss.sgi.com Mon Nov 19 07:52:43 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 19 Nov 2007 07:52:50 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from sandeen.net (sandeen.net [209.173.210.139]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAJFqgbN018233 for ; Mon, 19 Nov 2007 07:52:43 -0800 Received: from liberator.sandeen.net (liberator.sandeen.net [10.0.0.4]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by sandeen.net (Postfix) with ESMTP id 28BE41800F676; Mon, 19 Nov 2007 09:52:46 -0600 (CST) Message-ID: <4741B14D.8020004@sandeen.net> Date: Mon, 19 Nov 2007 09:52:45 -0600 From: Eric Sandeen User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Christoph Hellwig CC: Timothy Shimmin , Vlad Apostolov , Barry Naujok , "xfs@oss.sgi.com" , xfs-dev Subject: Re: REVIEW: xfs_reno #2 References: <473D32D9.2020500@sgi.com> <473D36AC.7040507@sgi.com> <20071119123918.GA15942@infradead.org> In-Reply-To: <20071119123918.GA15942@infradead.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4841/Sun Nov 18 20:25:36 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13701 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: sandeen@sandeen.net Precedence: bulk X-list: xfs Christoph Hellwig wrote: > On Fri, Nov 16, 2007 at 05:20:28PM +1100, Timothy Shimmin wrote: >> Vlad Apostolov wrote: >>> When the XFS parent pointers feature is released we would need to find >>> out to update the EA to point to the new inode parent directory. This may >>> not be that easy though. >>> >> Really? >> Apart from the swapping of extents, reno uses standard calls doesn't it, >> in which case any movement of inodes will have the parent pointers >> updated by the normal vnode ops (e.g. mkdir, rename) in the kernel. > > What parent pointers? > > The ones not yet released I guess :) >> Vlad Apostolov wrote: >>> When the XFS parent pointers feature is released... -Eric From owner-xfs@oss.sgi.com Mon Nov 19 10:42:08 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 19 Nov 2007 10:42:21 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=0.1 required=5.0 tests=BAYES_50,J_CHICKENPOX_23, T_STOX_BOUND_090909_B autolearn=no version=3.3.0-r574664 Received: from strike.wu-wien.ac.at (strike.wu-wien.ac.at [137.208.89.120]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAJIg0gc018040 for ; Mon, 19 Nov 2007 10:42:07 -0800 Received: from localhost (localhost.localdomain [127.0.0.1]) by strike.wu-wien.ac.at (Postfix) with ESMTP id 8F8DAC020B2; Mon, 19 Nov 2007 19:10:38 +0100 (CET) X-Virus-Scanned: ClamAV 0.91.2/4843/Mon Nov 19 08:02:15 2007 on oss.sgi.com X-Virus-Scanned: amavisd-new at strike.wu-wien.ac.at Received: from strike.wu-wien.ac.at ([127.0.0.1]) by localhost (strike.wu-wien.ac.at [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id DKKoDLOe6FzE; Mon, 19 Nov 2007 19:10:37 +0100 (CET) Received: from [137.208.89.100] (ariel.wu-wien.ac.at [137.208.89.100]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: leo) by strike.wu-wien.ac.at (Postfix) with ESMTP; Mon, 19 Nov 2007 19:10:37 +0100 (CET) Message-ID: <4741D198.2060908@strike.wu-wien.ac.at> Date: Mon, 19 Nov 2007 19:10:32 +0100 From: Alexander Bergolth User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.12) Gecko/20070530 Fedora/1.5.0.12-1.fc5 Thunderbird/1.5.0.12 Mnenhy/0.7.5.0 MIME-Version: 1.0 To: Chris Wedgwood CC: Emmanuel Florac , Eric Sandeen , xfs@oss.sgi.com Subject: Re: XFS crash on linux raid References: <20070503164521.16efe075@harpe.intellique.com> <20070504005922.GC32602149@melbourne.sgi.com> <20070504090613.7c0f97d3@galadriel.home> <20070504073344.GL32602149@melbourne.sgi.com> <20070504152546.614374ac@harpe.intellique.com> <463B4962.70904@sandeen.net> <20070504173049.14606033@harpe.intellique.com> <20070504232028.GA19744@tuatara.stupidest.org> In-Reply-To: <20070504232028.GA19744@tuatara.stupidest.org> X-Enigmail-Version: 0.94.2.0 Content-Type: multipart/mixed; boundary="------------090607070609050208000105" X-Virus-Status: Clean X-archive-position: 13702 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: leo@strike.wu-wien.ac.at Precedence: bulk X-list: xfs This is a multi-part message in MIME format. --------------090607070609050208000105 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hi! Several months ago, you posted a patch for 8k stacks together with irqstacks: On 05/05/2007 01:20 AM, Chris Wedgwood wrote: > Almost three years ago I posted patches to split the CONFIG_4KSTACKS > option into two options. I quickly just ported that to 2.6.21 just > now (very quickly, I might have goofed fixing up the rejects). Do you have a working version of your patch for 2.6.23? I've been using a similar patch (attached) for several years now but since 2.6.23, it produces oopses like below. The patch applies cleanly but there seems to be some major change between 2.6.22 and 2.6.23 that isn't covered correctly. Thanks, --leo P.S.: The attached patch is part of ATrpms' 8k kernel: http://atrpms.net/dist/f8/kernel-tuxonice/ 2.6.23.1-49_0.99.cubbi_tuxonice_8k and 2.6.23.1-42_0.99.cubbi_tuxonice_8k both show the problem, the corresponding versions without _8k work without any problem. 2.6.22.1-41_0.99.cubbi_tuxonice_8k on Fedora 7 worked fine too. -------------------- 8< -------------------- BUG: unable to handle kernel NULL pointer dereference at virtual address 00000050 printing eip: 00000050 *pde = 3ed4c067 Oops: 0000 [#1] SMP Modules linked in: ipv6 ext2 mbcache loop ahci sr_mod cdrom ata_generic firewire_ohci firewire_core pata_pdc2027x crc_itu_t sata_sil iTCO_wdt i2c_i801 button parport_pc parport iTCO_vendor_support i2c_core ata_piix pcspkr intel_agp sky2 floppy sg dm_snapshot dm_zero dm_mirror dm_mod pata_it821x libata sd_mod scsi_mod raid456 async_xor async_memcpy async_tx xor raid1 xfs uhci_hcd ohci_hcd ehci_hcd CPU: 0 EIP: 0060:[<00000050>] Not tainted VLI EFLAGS: 00210082 (2.6.23.1-49_0.99.cubbi_tuxonice_8k.fc8 #1) EIP is at 0x50 eax: 00000001 ebx: f8947796 ecx: 01073000 edx: c0799200 esi: f8897310 edi: f79d8000 ebp: f79d8a2c esp: c07defa0 ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Process rklogd (pid: 1787, ti=c07de000 task=f74fb840 task.ti=f744e000) Stack: 00200292 f894472d 00000000 000000dc f7f5a6a0 f7f8f2b4 00000002 00000001 00200292 00000001 00000012 00000012 c041e022 c0465fcb c0741700 c0741700 00000012 00000000 c04672e5 c07def7c 00000012 c0467249 c0741700 c04074cb Call Trace: [] ata_interrupt+0x1ab/0x1be [libata] [] ack_ioapic_quirk_irq+0x34/0x86 [] handle_IRQ_event+0x23/0x51 [] handle_fasteoi_irq+0x9c/0xa6 [] handle_fasteoi_irq+0x0/0xa6 [] do_IRQ+0x8c/0xb9 ======================= Code: Bad EIP value. EIP: [<00000050>] 0x50 SS:ESP 0068:c07defa0 BUG: unable to handle kernel paging request at virtual address 010b2fc8 printing eip: c07dee7c *pde = 00000000 Oops: 0002 [#2] SMP Modules linked in: ipv6 ext2 mbcache loop ahci sr_mod cdrom ata_generic firewire_ohci firewire_core pata_pdc2027x crc_itu_t sata_sil iTCO_wdt i2c_i801 button parport_pc parport iTCO_vendor_support i2c_core ata_piix pcspkr intel_agp sky2 floppy sg dm_snapshot dm_zero dm_mirror dm_mod pata_it821x libata sd_mod scsi_mod raid456 async_xor async_memcpy async_tx xor raid1 xfs uhci_hcd ohci_hcd ehci_hcd CPU: 0 EIP: 0060:[] Tainted: G D VLI EFLAGS: 00210086 (2.6.23.1-49_0.99.cubbi_tuxonice_8k.fc8 #1) EIP is at hardirq_stack+0x1e7c/0x40000 eax: 00200046 ebx: f74fb88c ecx: 00200286 edx: 01073000 esi: 00051b65 edi: 001e8480 ebp: 00200006 esp: c07dee58 ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Process rklogd (pid: 1787, ti=c07de000 task=f74fb840 task.ti=f744e000) Stack: c0425763 153fdb8a 00000001 13125d1e 00000001 f7bb5840 f74fb840 f7bb5840 c07dee90 c042ab0f 001e8480 00000000 00000001 00000000 00200082 c0427c1d ffffff10 c04300ad 00000000 0000000f f7bb5840 00000000 c180c200 00000000 Call Trace: [] __check_preempt_curr_fair+0x55/0x86 [] check_preempt_curr_fair+0x6b/0x71 [] try_to_wake_up+0x2ef/0x2f9 [] do_exit+0x11b/0x6fc [] del_timer+0x48/0x4e [] ata_scsi_qc_complete+0x3bd/0x3cb [libata] [] do_page_fault+0x521/0x5ef [] __ata_qc_complete+0x8c/0x92 [libata] [] ata_hsm_move+0x6d1/0x70c [libata] [] it821x_passthru_bmdma_stop+0x17/0x36 [pata_it821x] [] do_page_fault+0x0/0x5ef [] error_code+0x72/0x78 [] ata_altstatus+0x1c/0x20 [libata] [] ata_bmdma_stop+0x1a/0x23 [libata] [] ata_altstatus+0x1c/0x20 [libata] [] it821x_passthru_bmdma_stop+0x17/0x36 [pata_it821x] [] ata_interrupt+0x1ab/0x1be [libata] [] ack_ioapic_quirk_irq+0x34/0x86 [] handle_IRQ_event+0x23/0x51 [] handle_fasteoi_irq+0x9c/0xa6 [] handle_fasteoi_irq+0x0/0xa6 [] do_IRQ+0x8c/0xb9 ======================= Code: 00 00 00 86 00 21 00 63 57 42 c0 8a db 3f 15 01 00 00 00 1e 5d 12 13 01 00 00 00 40 58 bb f7 40 b8 4f f7 40 58 bb f7 90 ee 7d c0 <0f> ab 42 c0 80 84 1e 00 00 00 00 00 01 00 00 00 00 00 00 00 82 EIP: [] hardirq_stack+0x1e7c/0x40000 SS:ESP 0068:c07dee58 Fixing recursive fault but reboot is needed! -------------------- 8< -------------------- -- e-mail ::: Alexander.Bergolth (at) wu-wien.ac.at fax ::: +43-1-31336-906050 location ::: Computer Center | Vienna University of Economics | Austria --------------090607070609050208000105 Content-Type: text/x-patch; name="linux-2.6.22-8k_stacks.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="linux-2.6.22-8k_stacks.patch" --- linux-2.6.22.i686/arch/i386/Kconfig.debug.orig 2007-07-09 01:32:17.000000000 +0200 +++ linux-2.6.22.i686/arch/i386/Kconfig.debug 2007-07-22 13:49:57.000000000 +0200 @@ -85,4 +85,9 @@ option saves about 4k and might cause you much additional grey hair. + config IRQSTACKS + bool "use IRQ stacks" + depends on !4KSTACKS + default n + endmenu --- linux-2.6.22.i686/arch/i386/kernel/irq.c.orig 2007-07-09 01:32:17.000000000 +0200 +++ linux-2.6.22.i686/arch/i386/kernel/irq.c 2007-07-22 13:50:50.000000000 +0200 @@ -50,7 +50,7 @@ #endif } -#ifdef CONFIG_4KSTACKS +#if defined(CONFIG_4KSTACKS) || defined(CONFIG_IRQSTACKS) /* * per-CPU IRQ handling contexts (thread information and stack) */ @@ -74,7 +74,7 @@ /* high bit used in ret_from_ code */ int irq = ~regs->orig_eax; struct irq_desc *desc = irq_desc + irq; -#ifdef CONFIG_4KSTACKS +#if defined(CONFIG_4KSTACKS) || defined(CONFIG_IRQSTACKS) union irq_ctx *curctx, *irqctx; u32 *isp; #endif @@ -102,7 +102,7 @@ } #endif -#ifdef CONFIG_4KSTACKS +#if defined(CONFIG_4KSTACKS) || defined(CONFIG_IRQSTACKS) curctx = (union irq_ctx *) current_thread_info(); irqctx = hardirq_ctx[smp_processor_id()]; @@ -147,7 +147,7 @@ return 1; } -#ifdef CONFIG_4KSTACKS +#if defined(CONFIG_4KSTACKS) || defined(CONFIG_IRQSTACKS) static char softirq_stack[NR_CPUS * THREAD_SIZE] __attribute__((__section__(".bss.page_aligned"))); --- linux-2.6.22.i686/include/asm-i386/irq.h.orig 2007-07-09 01:32:17.000000000 +0200 +++ linux-2.6.22.i686/include/asm-i386/irq.h 2007-07-22 13:51:34.000000000 +0200 @@ -24,7 +24,7 @@ # define ARCH_HAS_NMI_WATCHDOG /* See include/linux/nmi.h */ #endif -#ifdef CONFIG_4KSTACKS +#if defined(CONFIG_4KSTACKS) || defined(CONFIG_IRQSTACKS) extern void irq_ctx_init(int cpu); extern void irq_ctx_exit(int cpu); # define __ARCH_HAS_DO_SOFTIRQ --------------090607070609050208000105-- From owner-xfs@oss.sgi.com Mon Nov 19 14:08:12 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 19 Nov 2007 14:08:15 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAJM89cO016131 for ; Mon, 19 Nov 2007 14:08:11 -0800 Received: from [134.14.55.89] (soarer.melbourne.sgi.com [134.14.55.89]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA17075; Tue, 20 Nov 2007 09:08:03 +1100 Message-ID: <4742094C.2020705@sgi.com> Date: Tue, 20 Nov 2007 09:08:12 +1100 From: Vlad Apostolov User-Agent: Thunderbird 2.0.0.6 (X11/20070728) MIME-Version: 1.0 To: Eric Sandeen CC: Christoph Hellwig , Timothy Shimmin , Barry Naujok , "xfs@oss.sgi.com" , xfs-dev Subject: Re: REVIEW: xfs_reno #2 References: <473D32D9.2020500@sgi.com> <473D36AC.7040507@sgi.com> <20071119123918.GA15942@infradead.org> <4741B14D.8020004@sandeen.net> In-Reply-To: <4741B14D.8020004@sandeen.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4846/Mon Nov 19 10:26:19 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13703 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: vapo@sgi.com Precedence: bulk X-list: xfs Eric Sandeen wrote: > Christoph Hellwig wrote: > >> On Fri, Nov 16, 2007 at 05:20:28PM +1100, Timothy Shimmin wrote: >> >>> Vlad Apostolov wrote: >>> >>>> When the XFS parent pointers feature is released we would need to find >>>> out to update the EA to point to the new inode parent directory. This may >>>> not be that easy though. >>>> >>>> >>> Really? >>> Apart from the swapping of extents, reno uses standard calls doesn't it, >>> in which case any movement of inodes will have the parent pointers >>> updated by the normal vnode ops (e.g. mkdir, rename) in the kernel. >>> >> What parent pointers? >> >> >> > > The ones not yet released I guess :) > It is a released and existing feature on XFS for Irix, that we are back porting to Linux. You will see some patches soon. Regards, Vlad > >>> Vlad Apostolov wrote: >>> >>>> When the XFS parent pointers feature is released... >>>> > > -Eric > From owner-xfs@oss.sgi.com Mon Nov 19 15:44:11 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 19 Nov 2007 15:44:15 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from smtp115.sbc.mail.sp1.yahoo.com (smtp115.sbc.mail.sp1.yahoo.com [69.147.64.88]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAJNi9cw027349 for ; Mon, 19 Nov 2007 15:44:11 -0800 Received: (qmail 2852 invoked from network); 19 Nov 2007 23:44:17 -0000 Received: from unknown (HELO stupidest.org) (cwedgwood@sbcglobal.net@75.36.198.62 with login) by smtp115.sbc.mail.sp1.yahoo.com with SMTP; 19 Nov 2007 23:44:16 -0000 X-YMail-OSG: 1W2JClAVM1nmNp4JAuS8Vkfwo9cIIIprZLxpV8BSlyfP9yJaZNmH2XCVMU74dpsbeY.I5eVXdw-- Received: by tuatara.stupidest.org (Postfix, from userid 10000) id 8BF2C28290F8; Mon, 19 Nov 2007 15:44:15 -0800 (PST) Date: Mon, 19 Nov 2007 15:44:15 -0800 From: Chris Wedgwood To: Alexander Bergolth Cc: Emmanuel Florac , Eric Sandeen , xfs@oss.sgi.com Subject: Re: XFS crash on linux raid Message-ID: <20071119234415.GB29042@puku.stupidest.org> References: <20070503164521.16efe075@harpe.intellique.com> <20070504005922.GC32602149@melbourne.sgi.com> <20070504090613.7c0f97d3@galadriel.home> <20070504073344.GL32602149@melbourne.sgi.com> <20070504152546.614374ac@harpe.intellique.com> <463B4962.70904@sandeen.net> <20070504173049.14606033@harpe.intellique.com> <20070504232028.GA19744@tuatara.stupidest.org> <4741D198.2060908@strike.wu-wien.ac.at> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4741D198.2060908@strike.wu-wien.ac.at> X-Virus-Scanned: ClamAV 0.91.2/4848/Mon Nov 19 14:34:22 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13704 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: cw@f00f.org Precedence: bulk X-list: xfs On Mon, Nov 19, 2007 at 07:10:32PM +0100, Alexander Bergolth wrote: > Several months ago, you posted a patch for 8k stacks together with > irqstacks: I've posted it off and on for two or three years now --- quite a few people are using it or the equivalent of it, but thus far people resist merging this. The other idea (also nixed) that I posed was to dosable XFS if people are using 4KSTACKS > Do you have a working version of your patch for 2.6.23? No. I can go digg out a 32-bit machine and retest though. > I've been using a similar patch (attached) for several years now but > since 2.6.23, it produces oopses like below. The patch applies > cleanly but there seems to be some major change between 2.6.22 and > 2.6.23 that isn't covered correctly. I'll take a look. Can you make sure it works with 8K stacks (more or less works). From owner-xfs@oss.sgi.com Mon Nov 19 17:37:00 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 19 Nov 2007 17:37:04 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAK1apNa009011 for ; Mon, 19 Nov 2007 17:36:58 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA23830; Tue, 20 Nov 2007 12:36:53 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAK1aqdD112962984; Tue, 20 Nov 2007 12:36:52 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAK1apCM112932193; Tue, 20 Nov 2007 12:36:51 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Tue, 20 Nov 2007 12:36:51 +1100 From: David Chinner To: Barry Naujok Cc: "xfs@oss.sgi.com" , xfs-dev Subject: Re: REVIEW: xfs_reno #2 Message-ID: <20071120013651.GR995458@sgi.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4848/Mon Nov 19 14:34:22 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13705 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Thu, Oct 04, 2007 at 02:25:16PM +1000, Barry Naujok wrote: > A couple changes from the first xfs_reno: > > - Major one is that symlinks are now supported, but only > owner, group and extended attributes are copied for them > (not times or inode attributes). > > - Man page! > > > To make this better, ideally we need some form of > "swap inodes" function in the kernel, where the entire > contents of the inode themselves are swapped. This form > can handle any inode and without any of the dir/file/attr/etc > copy/swap mechanisms we have in xfs_reno. Something like the attached patch? This is proof-of-concept. I've compiled it but I haven't tested it. Your mission, Barry, should you choose to accept it, it to make it work ;) Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group --- fs/xfs/linux-2.6/xfs_ioctl.c | 4 fs/xfs/xfs_dfrag.c | 313 ++++++++++++++++++++++++++++++++++++------- fs/xfs/xfs_dfrag.h | 24 ++- fs/xfs/xfs_fs.h | 1 fs/xfs/xfs_trans.h | 3 fs/xfs/xfsidbg.c | 9 - 6 files changed, 297 insertions(+), 57 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_ioctl.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_ioctl.c 2007-11-16 10:27:41.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_ioctl.c 2007-11-20 11:18:45.829822690 +1100 @@ -817,6 +817,10 @@ xfs_ioctl( error = xfs_swapext((struct xfs_swapext __user *)arg); return -error; } + case XFS_IOC_SWAPINO: { + error = xfs_swapino((struct xfs_swapino __user *)arg); + return -error; + } case XFS_IOC_FSCOUNTS: { xfs_fsop_counts_t out; Index: 2.6.x-xfs-new/fs/xfs/xfs_dfrag.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_dfrag.c 2007-11-16 10:27:41.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_dfrag.c 2007-11-20 11:41:28.196327293 +1100 @@ -44,6 +44,20 @@ #include "xfs_rw.h" #include "xfs_vnodeops.h" + +STATIC int +xfs_swap_fd_to_inode(int fd, struct file **filp, xfs_inode_t **ip) +{ + *filp = fget(fd); + if (!*filp) + return EINVAL; + + *ip = XFS_I((*filp)->f_path.dentry->d_inode); + if (!*ip) + return EBADF; + return 0; +} + /* * Syssgi interface for swapext */ @@ -53,75 +67,85 @@ xfs_swapext( { xfs_swapext_t *sxp; xfs_inode_t *ip=NULL, *tip=NULL; - xfs_mount_t *mp; struct file *fp = NULL, *tfp = NULL; - bhv_vnode_t *vp, *tvp; - int error = 0; + int error; + error = ENOMEM; sxp = kmem_alloc(sizeof(xfs_swapext_t), KM_MAYFAIL); - if (!sxp) { - error = XFS_ERROR(ENOMEM); + if (!sxp) goto error0; - } - - if (copy_from_user(sxp, sxu, sizeof(xfs_swapext_t))) { - error = XFS_ERROR(EFAULT); + error = EFAULT; + if (copy_from_user(sxp, sxu, sizeof(xfs_swapext_t))) goto error0; - } - /* Pull information for the target fd */ - if (((fp = fget((int)sxp->sx_fdtarget)) == NULL) || - ((vp = vn_from_inode(fp->f_path.dentry->d_inode)) == NULL)) { - error = XFS_ERROR(EINVAL); + error = xfs_swap_fd_to_inode((int)sxp->sx_fdtarget, &fp, &ip); + if (error) goto error0; - } - - ip = xfs_vtoi(vp); - if (ip == NULL) { - error = XFS_ERROR(EBADF); + error = xfs_swap_fd_to_inode((int)sxp->sx_fdtmp, &tfp, &tip); + if (error) goto error0; - } - if (((tfp = fget((int)sxp->sx_fdtmp)) == NULL) || - ((tvp = vn_from_inode(tfp->f_path.dentry->d_inode)) == NULL)) { - error = XFS_ERROR(EINVAL); + error = EINVAL; + if (ip->i_mount != tip->i_mount) goto error0; - } - - tip = xfs_vtoi(tvp); - if (tip == NULL) { - error = XFS_ERROR(EBADF); + if (ip->i_ino == tip->i_ino) goto error0; - } - - if (ip->i_mount != tip->i_mount) { - error = XFS_ERROR(EINVAL); + error = EIO; + if (XFS_FORCED_SHUTDOWN(ip->i_mount)) goto error0; - } - if (ip->i_ino == tip->i_ino) { - error = XFS_ERROR(EINVAL); - goto error0; - } + error = xfs_swap_extents(ip, tip, sxp); +error0: + if (fp != NULL) + fput(fp); + if (tfp != NULL) + fput(tfp); + if (sxp != NULL) + kmem_free(sxp, sizeof(xfs_swapext_t)); + return error; +} - mp = ip->i_mount; - if (XFS_FORCED_SHUTDOWN(mp)) { - error = XFS_ERROR(EIO); +int +xfs_swapino( + xfs_swapino_t __user *siu) +{ + xfs_swapino_t *sino; + xfs_inode_t *ip=NULL, *tip=NULL; + struct file *fp = NULL, *tfp = NULL; + int error; + + error = ENOMEM; + sino = kmem_alloc(sizeof(xfs_swapino_t), KM_MAYFAIL); + if (!sino) + goto error0; + error = EFAULT; + if (copy_from_user(sino, siu, sizeof(xfs_swapino_t))) goto error0; - } - error = xfs_swap_extents(ip, tip, sxp); + error = xfs_swap_fd_to_inode((int)sino->sx_fdtarget, &fp, &ip); + if (error) + goto error0; + error = xfs_swap_fd_to_inode((int)sino->sx_fdtmp, &tfp, &tip); + if (error) + goto error0; + error = EINVAL; + if (ip->i_mount != tip->i_mount) + goto error0; + if (ip->i_ino == tip->i_ino) + goto error0; + error = EIO; + if (XFS_FORCED_SHUTDOWN(ip->i_mount)) + goto error0; - error0: + error = xfs_swap_inodes(ip, tip, sino); +error0: if (fp != NULL) fput(fp); if (tfp != NULL) fput(tfp); - - if (sxp != NULL) - kmem_free(sxp, sizeof(xfs_swapext_t)); - + if (sino != NULL) + kmem_free(sino, sizeof(xfs_swapino_t)); return error; } @@ -397,3 +421,198 @@ xfs_swap_extents( kmem_free(tempifp, sizeof(xfs_ifork_t)); return error; } + +STATIC void +xfs_swapino_log_fields( + xfs_trans_t *tp, + xfs_inode_t *ip) +{ + int ilf_fields = XFS_ILOG_CORE; + + switch(ip->i_d.di_format) { + case XFS_DINODE_FMT_EXTENTS: + /* If the extents fit in the inode, fix the + * pointer. Otherwise it's already NULL or + * pointing to the extent. + */ + if (ip->i_d.di_nextents <= XFS_INLINE_EXTS) { + xfs_ifork_t *ifp = &ip->i_df; + ifp->if_u1.if_extents = ifp->if_u2.if_inline_ext; + } + ilf_fields |= XFS_ILOG_DEXT; + break; + case XFS_DINODE_FMT_BTREE: + ilf_fields |= XFS_ILOG_DBROOT; + break; + } + + switch(ip->i_d.di_aformat) { + case XFS_DINODE_FMT_LOCAL: + ilf_fields |= XFS_ILOG_ADATA; + break; + case XFS_DINODE_FMT_EXTENTS: + /* If the extents fit in the inode, fix the + * pointer. Otherwise it's already NULL or + * pointing to the extent. + */ + if (ip->i_d.di_nextents <= XFS_INLINE_EXTS) { + xfs_ifork_t *ifp = ip->i_afp; + ifp->if_u1.if_extents = ifp->if_u2.if_inline_ext; + } + ilf_fields |= XFS_ILOG_AEXT; + break; + case XFS_DINODE_FMT_BTREE: + ilf_fields |= XFS_ILOG_ABROOT; + break; + } + xfs_trans_log_inode(tp, ip, ilf_fields); +} + +int +xfs_swap_inodes( + xfs_inode_t *ip, + xfs_inode_t *tip, + xfs_swapino_t *sino) +{ + xfs_mount_t *mp; + xfs_inode_t *ips[2]; + xfs_trans_t *tp; + xfs_icdinode_t *dic = NULL; + xfs_ifork_t *tempifp, *ifp, *tifp, *i_afp; + static uint lock_flags = XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL; + int error; + char locked = 0; + + mp = ip->i_mount; + error = ENOMEM; + tempifp = kmem_alloc(sizeof(xfs_ifork_t), KM_MAYFAIL); + if (!tempifp) + goto error0; + dic = kmem_alloc(sizeof(xfs_dinode_core_t), KM_MAYFAIL); + if (!dic) + goto error0; + + /* Lock in i_ino order */ + if (ip->i_ino < tip->i_ino) { + ips[0] = ip; + ips[1] = tip; + } else { + ips[0] = tip; + ips[1] = ip; + } + + xfs_lock_inodes(ips, 2, 0, lock_flags); + locked = 1; + + /* Check permissions */ + error = xfs_iaccess(ip, S_IWUSR, NULL); + if (error) + goto error0; + error = xfs_iaccess(tip, S_IWUSR, NULL); + if (error) + goto error0; + + /* Verify that both files have the same format */ + error = EINVAL; + if ((ip->i_d.di_mode & S_IFMT) != (tip->i_d.di_mode & S_IFMT)) + goto error0; + + /* Verify both files are either real-time or non-realtime */ + if (XFS_IS_REALTIME_INODE(ip) != XFS_IS_REALTIME_INODE(tip)) + goto error0; + + if (VN_CACHED(tip->i_vnode) != 0) { + xfs_inval_cached_trace(tip, 0, -1, 0, -1); + error = xfs_flushinval_pages(tip, 0, -1, + FI_REMAPF_LOCKED); + if (error) + goto error0; + } + + xfs_iunlock(ip, XFS_ILOCK_EXCL); + xfs_iunlock(tip, XFS_ILOCK_EXCL); + + /* + * There is a race condition here since we gave up the + * ilock. However, the data fork will not change since + * we have the iolock (locked for truncation too) so we + * are safe. We don't really care if non-io related + * fields change. + */ + + xfs_tosspages(ip, 0, -1, FI_REMAPF); + + tp = xfs_trans_alloc(mp, XFS_TRANS_SWAPINO); + error = xfs_trans_reserve(tp, 0, 2 * XFS_ICHANGE_LOG_RES(mp), 0, 0, 0); + if (error) { + xfs_iunlock(ip, XFS_IOLOCK_EXCL); + xfs_iunlock(tip, XFS_IOLOCK_EXCL); + xfs_trans_cancel(tp, 0); + locked = 0; + goto error0; + } + xfs_lock_inodes(ips, 2, 0, XFS_ILOCK_EXCL); + + /* + * Swap the inode cores - structure copies. + */ + *dic = ip->i_d; + ip->i_d = tip->i_d; + tip->i_d = *dic; + + /* + * Swap the data forks of the inodes - structure copies + */ + ifp = &ip->i_df; + tifp = &tip->i_df; + *tempifp = *ifp; + *ifp = *tifp; + *tifp = *tempifp; + + /* + * Swap the attribute forks + */ + i_afp = ip->i_afp; + ip->i_afp = tip->i_afp; + tip->i_afp = i_afp; + + /* + * Increment vnode ref counts since xfs_trans_commit & + * xfs_trans_cancel will both unlock the inodes and + * decrement the associated ref counts. + */ + VN_HOLD(ip->i_vnode); + VN_HOLD(tip->i_vnode); + xfs_trans_ijoin(tp, ip, lock_flags); + xfs_trans_ijoin(tp, tip, lock_flags); + + + /* + * log both entire inodes + */ + xfs_swapino_log_fields(tp, ip); + xfs_swapino_log_fields(tp, tip); + + /* + * If this is a synchronous mount, make sure that the + * transaction goes to disk before returning to the user. + */ + if (mp->m_flags & XFS_MOUNT_WSYNC) + xfs_trans_set_sync(tp); + + error = xfs_trans_commit(tp, XFS_TRANS_SWAPINO); + locked = 0; + + error0: + if (locked) { + xfs_iunlock(ip, lock_flags); + xfs_iunlock(tip, lock_flags); + } + vn_revalidate(ip->i_vnode); + vn_revalidate(tip->i_vnode); + if (dic) + kmem_free(dic, sizeof(xfs_icdinode_t)); + if (tempifp) + kmem_free(tempifp, sizeof(xfs_ifork_t)); + return error; +} Index: 2.6.x-xfs-new/fs/xfs/xfs_dfrag.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_dfrag.h 2007-01-16 10:54:17.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_dfrag.h 2007-11-20 11:20:01.364010037 +1100 @@ -21,7 +21,6 @@ /* * Structure passed to xfs_swapext */ - typedef struct xfs_swapext { __int64_t sx_version; /* version */ @@ -38,19 +37,34 @@ typedef struct xfs_swapext */ #define XFS_SX_VERSION 0 -#ifdef __KERNEL__ /* - * Prototypes for visible xfs_dfrag.c routines. + * Structure passed to xfs_swapext */ +typedef struct xfs_swapino +{ + __int64_t sx_version; /* version */ + __int64_t sx_fdtarget; /* fd of target file */ + __int64_t sx_fdtmp; /* fd of tmp file */ + char sx_pad[16]; /* pad space, unused */ +} xfs_swapino_t; /* - * Syscall interface for xfs_swapext + * Version flag + */ +#define XFS_SI_VERSION 0 + +#ifdef __KERNEL__ +/* + * Prototypes for visible xfs_dfrag.c routines. */ -int xfs_swapext(struct xfs_swapext __user *sx); +int xfs_swapext(struct xfs_swapext __user *sx); int xfs_swap_extents(struct xfs_inode *ip, struct xfs_inode *tip, struct xfs_swapext *sxp); +int xfs_swapino(struct xfs_swapino __user *si); +int xfs_swap_inodes(struct xfs_inode *ip, struct xfs_inode *tip, + struct xfs_swapino *sino); #endif /* __KERNEL__ */ #endif /* __XFS_DFRAG_H__ */ Index: 2.6.x-xfs-new/fs/xfs/xfs_fs.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_fs.h 2007-10-15 09:58:18.000000000 +1000 +++ 2.6.x-xfs-new/fs/xfs/xfs_fs.h 2007-11-20 11:19:54.640883392 +1100 @@ -480,6 +480,7 @@ typedef struct xfs_handle { #define XFS_IOC_ATTRMULTI_BY_HANDLE _IOW ('X', 123, struct xfs_fsop_attrmulti_handlereq) #define XFS_IOC_FSGEOMETRY _IOR ('X', 124, struct xfs_fsop_geom) #define XFS_IOC_GOINGDOWN _IOR ('X', 125, __uint32_t) +#define XFS_IOC_SWAPINO _IOWR('X', 126, struct xfs_swapino) /* XFS_IOC_GETFSUUID ---------- deprecated 140 */ Index: 2.6.x-xfs-new/fs/xfs/xfs_trans.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_trans.h 2007-11-16 11:32:26.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_trans.h 2007-11-20 11:28:17.027542129 +1100 @@ -95,7 +95,8 @@ typedef struct xfs_trans_header { #define XFS_TRANS_GROWFSRT_FREE 39 #define XFS_TRANS_SWAPEXT 40 #define XFS_TRANS_SB_COUNT 41 -#define XFS_TRANS_TYPE_MAX 41 +#define XFS_TRANS_SWAPINO 42 +#define XFS_TRANS_TYPE_MAX 42 /* new transaction types need to be reflected in xfs_logprint(8) */ Index: 2.6.x-xfs-new/fs/xfs/xfsidbg.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfsidbg.c 2007-11-16 11:32:26.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfsidbg.c 2007-11-20 11:29:03.701447244 +1100 @@ -5911,11 +5911,12 @@ xfsidbg_print_trans_type(unsigned int t_ case XFS_TRANS_GROWFSRT_ALLOC: kdb_printf("GROWFSRT_ALLOC"); break; case XFS_TRANS_GROWFSRT_ZERO: kdb_printf("GROWFSRT_ZERO"); break; case XFS_TRANS_GROWFSRT_FREE: kdb_printf("GROWFSRT_FREE"); break; - case XFS_TRANS_SWAPEXT: kdb_printf("SWAPEXT"); break; + case XFS_TRANS_SWAPEXT: kdb_printf("SWAPEXT"); break; case XFS_TRANS_SB_COUNT: kdb_printf("SB_COUNT"); break; - case XFS_TRANS_DUMMY1: kdb_printf("DUMMY1"); break; - case XFS_TRANS_DUMMY2: kdb_printf("DUMMY2"); break; - case XLOG_UNMOUNT_REC_TYPE: kdb_printf("UNMOUNT"); break; + case XFS_TRANS_SWAPINO: kdb_printf("SWAPINO"); break; + case XFS_TRANS_DUMMY1: kdb_printf("DUMMY1"); break; + case XFS_TRANS_DUMMY2: kdb_printf("DUMMY2"); break; + case XLOG_UNMOUNT_REC_TYPE: kdb_printf("UNMOUNT"); break; default: kdb_printf("unknown(0x%x)", t_type); break; } } From owner-xfs@oss.sgi.com Mon Nov 19 19:44:52 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 19 Nov 2007 19:44:56 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL,BAYES_00,SUBJ_FRIEND autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAK3ilK1022732 for ; Mon, 19 Nov 2007 19:44:51 -0800 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA28708; Tue, 20 Nov 2007 14:44:50 +1100 Received: by chook.melbourne.sgi.com (Postfix, from userid 1116) id A36FB58C4C0A; Tue, 20 Nov 2007 14:44:50 +1100 (EST) To: sgi.bugs.xfs@engr.sgi.com, xfs@oss.sgi.com Subject: PARTIAL TAKE 971186 - remove BPCSHIFT and friends Message-Id: <20071120034450.A36FB58C4C0A@chook.melbourne.sgi.com> Date: Tue, 20 Nov 2007 14:44:50 +1100 (EST) From: tes@sgi.com (Tim Shimmin) X-Virus-Scanned: ClamAV 0.91.2/4851/Mon Nov 19 18:35:53 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13706 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: tes@sgi.com Precedence: bulk X-list: xfs Remove the BPCSHIFT and NB* based macros from XFS. The BPCSHIFT based macros, btoc*, ctob*, offtoc* and ctooff are either not used or don't need to be used. The NDPP, NDPP, NBBY macros don't need to be used but instead are replaced directly by PAGE_SIZE and PAGE_CACHE_SIZE where appropriate. Initial patch and motivation from Nicolas Kaiser. Bunch of reviews from Dave Chinner and final from Lachlan. Date: Tue Nov 20 14:42:18 AEDT 2007 Workarea: chook.melbourne.sgi.com:/build/tes/2.6.x-xfs Inspected by: lachlan@sgi.com The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb Modid: xfs-linux-melb:xfs-kern:30096a fs/xfs/xfs_log.c - 1.345 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_log.c.diff?r1=text&tr1=1.345&r2=text&tr2=1.344&f=h - simplify and use PAGE_SIZE fs/xfs/xfs_vnodeops.c - 1.726 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_vnodeops.c.diff?r1=text&tr1=1.726&r2=text&tr2=1.725&f=h - simplify and use PAGE_CACHE_SIZE and reuse ioffset fs/xfs/xfs_itable.c - 1.159 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_itable.c.diff?r1=text&tr1=1.159&r2=text&tr2=1.158&f=h - simplify and use PAGE_SIZE fs/xfs/xfs_bmap.c - 1.382 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_bmap.c.diff?r1=text&tr1=1.382&r2=text&tr2=1.381&f=h - simplify and use PAGE_CACHE_SIZE fs/xfs/quota/xfs_qm.h - 1.18 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/quota/xfs_qm.h.diff?r1=text&tr1=1.18&r2=text&tr2=1.17&f=h - simplify and use PAGE_SIZE fs/xfs/linux-2.6/xfs_lrw.c - 1.272 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_lrw.c.diff?r1=text&tr1=1.272&r2=text&tr2=1.271&f=h - simplify and use PAGE_CACHE_MASK fs/xfs/linux-2.6/xfs_linux.h - 1.162 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_linux.h.diff?r1=text&tr1=1.162&r2=text&tr2=1.161&f=h - The BPCSHIFT based macros, btoc*, ctob*, offtoc* and ctooff are either not used or don't need to be used. The NDPP, NDPP, NBBY macros don't need to be used but instead are replaced directly by PAGE_SIZE and PAGE_CACHE_SIZE where appropriate. From owner-xfs@oss.sgi.com Mon Nov 19 21:09:56 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 19 Nov 2007 21:09:59 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAK59pVY002621 for ; Mon, 19 Nov 2007 21:09:55 -0800 Received: from timothy-shimmins-power-mac-g5.local (boing.melbourne.sgi.com [134.14.55.141]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA00706; Tue, 20 Nov 2007 16:09:53 +1100 Message-ID: <47426C70.3070704@sgi.com> Date: Tue, 20 Nov 2007 16:11:12 +1100 From: Timothy Shimmin User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Andreas Gruenbacher CC: linux-xfs@oss.sgi.com, Gerald Bringhurst , Brandon Philips Subject: Re: acl and attr: Fix path walking code References: <200710281858.24428.agruen@suse.de> <4733F301.9020706@sgi.com> <200711102152.05619.agruen@suse.de> <473A82E6.50709@sgi.com> In-Reply-To: <473A82E6.50709@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4853/Mon Nov 19 19:05:00 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13707 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: tes@sgi.com Precedence: bulk X-list: xfs > > > > On Friday 09 November 2007 08:39:56 Timothy Shimmin wrote: > >> > You mention -L/-P is like chown. > >> > However, -P for getattr isn't about not walking symlinks > >> > to directories, > >> > it's about skipping symlinks altogether, right? > > > > Hmm, -L and -P define which files and directories are visited, and -h > defines > > whether we are looking at symlinks or the files they point to. The two > > concepts are orthogonal. -P is not about skipping symlinks, only > about not > > recursing into them. > > > Oh okay. > There is the concept of following the symlink for traversal versus > following the symlink to get the EA on. > > So with -L should it just follow the symlink or look at the symlink first > and then follow it? > And will -h modify this behavior? > I'm still confused about the 1st difference in 062 output. > > I wonder if the man pages can be clarified in this area :) > > --Tim > Okay, looked at the code. --- no -h => stat, getxattr, listxattr -h => lstat, lgetxattr, llistxattr -P => skip symlinks (as soon as see them, then return from place in walk) -L => process symlink and then opendir on symlink (hence follow/traverse it) default => -L for argument (depth==1) and -P for subdirs (depth>1) ---- This was different than my intuition. I am happy with -h. For -P, I was expecting it to process symlink but not follow/traverse it, I wouldn't think it should just skip them altogether (I realise the man page says that). For -L, I wasn't sure if it should process it first and then traverse or simply just traverse. So were these your intentions? If so, the code seems to follow them but it would be nicer to have more explanation in the man page. --Tim From owner-xfs@oss.sgi.com Mon Nov 19 21:21:21 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 19 Nov 2007 21:21:24 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAK5LGDa004392 for ; Mon, 19 Nov 2007 21:21:19 -0800 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA01169; Tue, 20 Nov 2007 16:21:16 +1100 Received: by chook.melbourne.sgi.com (Postfix, from userid 16346) id 92D0858C4C0A; Tue, 20 Nov 2007 16:21:16 +1100 (EST) To: sgi.bugs.xfs@engr.sgi.com Cc: xfs@oss.sgi.com Subject: PARTIAL TAKE 971186 - Use kernel-supplied "roundup_pow_of_two" for simplicity Message-Id: <20071120052116.92D0858C4C0A@chook.melbourne.sgi.com> Date: Tue, 20 Nov 2007 16:21:16 +1100 (EST) From: dgc@sgi.com (David Chinner) X-Virus-Scanned: ClamAV 0.91.2/4853/Mon Nov 19 19:05:00 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13708 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Use kernel-supplied "roundup_pow_of_two" for simplicity Signed-off-by: Robert P. J. Day Date: Tue Nov 20 16:20:46 AEDT 2007 Workarea: chook.melbourne.sgi.com:/build/dgc/isms/2.6.x-xfs Inspected by: rpjday@crashcourse.ca The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb Modid: xfs-linux-melb:xfs-kern:30098a fs/xfs/xfs_inode.c - 1.487 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_inode.c.diff?r1=text&tr1=1.487&r2=text&tr2=1.486&f=h - Use roundup_pow_of_two() instead of hand rolling it. fs/xfs/xfs_inode.h - 1.238 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_inode.h.diff?r1=text&tr1=1.238&r2=text&tr2=1.237&f=h - Use roundup_pow_of_two() instead of hand rolling it. From owner-xfs@oss.sgi.com Tue Nov 20 01:35:04 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 20 Nov 2007 01:35:08 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from astoria.ccjclearline.com (astoria.ccjclearline.com [64.235.106.9]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAK9Z3GI006436 for ; Tue, 20 Nov 2007 01:35:04 -0800 Received: from [99.236.101.138] (helo=crashcourse.ca) by astoria.ccjclearline.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.68) (envelope-from ) id 1IuPVd-0007k0-KZ for xfs@oss.sgi.com; Tue, 20 Nov 2007 04:35:10 -0500 Date: Tue, 20 Nov 2007 04:33:17 -0500 (EST) From: "Robert P. J. Day" X-X-Sender: rpjday@localhost.localdomain To: xfs@oss.sgi.com Subject: [PATCH] XFS: Replace deprecated "TOPDIR" with newer "src" Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - astoria.ccjclearline.com X-AntiAbuse: Original Domain - oss.sgi.com X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - crashcourse.ca X-Source: X-Source-Args: X-Source-Dir: X-Virus-Scanned: ClamAV 0.91.2/4854/Tue Nov 20 00:17:56 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13709 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: rpjday@crashcourse.ca Precedence: bulk X-list: xfs Signed-off-by: Robert P. J. Day --- given that TOPDIR is explicitly deprecated in favour of src, this would seem to be an obvious change. diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 49e3e7e..e1b08b7 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -1 +1 @@ -include $(TOPDIR)/fs/xfs/Makefile-linux-$(VERSION).$(PATCHLEVEL) +include $(src)/fs/xfs/Makefile-linux-$(VERSION).$(PATCHLEVEL) rday ======================================================================== Robert P. J. Day Linux Consulting, Training and Annoying Kernel Pedantry Waterloo, Ontario, CANADA http://crashcourse.ca ======================================================================== From owner-xfs@oss.sgi.com Tue Nov 20 19:13:13 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 20 Nov 2007 19:13:19 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAL3D6eB010609 for ; Tue, 20 Nov 2007 19:13:12 -0800 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA10499; Wed, 21 Nov 2007 14:13:08 +1100 Received: by chook.melbourne.sgi.com (Postfix, from userid 1116) id A2FFA58C4C0A; Wed, 21 Nov 2007 14:13:08 +1100 (EST) To: sgi.bugs.xfs@engr.sgi.com, xfs@oss.sgi.com Subject: PARTIAL TAKE 973591 - fix up tree walking for symlinks in attr Message-Id: <20071121031308.A2FFA58C4C0A@chook.melbourne.sgi.com> Date: Wed, 21 Nov 2007 14:13:08 +1100 (EST) From: tes@sgi.com (Tim Shimmin) X-Virus-Scanned: ClamAV 0.91.2/4861/Tue Nov 20 18:20:00 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13712 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: tes@sgi.com Precedence: bulk X-list: xfs fix up tree walking for symlinks Date: Wed Nov 21 14:11:54 AEDT 2007 Workarea: chook.melbourne.sgi.com:/build/tes/xfs-cmds Inspected by: agruen@suse.de The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/xfs-cmds/master-melb Modid: master-melb:xfs-cmds:30105a attr/test/getfattr.test - 1.1 - new http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/attr/test/getfattr.test attr/libmisc/walk_tree.c - 1.1 - new http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/attr/libmisc/walk_tree.c attr/include/walk_tree.h - 1.1 - new http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/attr/include/walk_tree.h attr/VERSION - 1.70 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/attr/VERSION.diff?r1=text&tr1=1.70&r2=text&tr2=1.69&f=h attr/doc/CHANGES - 1.83 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/attr/doc/CHANGES.diff?r1=text&tr1=1.83&r2=text&tr2=1.82&f=h attr/test/attr.test - 1.7 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/attr/test/attr.test.diff?r1=text&tr1=1.7&r2=text&tr2=1.6&f=h attr/setfattr/Makefile - 1.8 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/attr/setfattr/Makefile.diff?r1=text&tr1=1.8&r2=text&tr2=1.7&f=h attr/getfattr/Makefile - 1.9 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/attr/getfattr/Makefile.diff?r1=text&tr1=1.9&r2=text&tr2=1.8&f=h attr/getfattr/getfattr.c - 1.27 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/attr/getfattr/getfattr.c.diff?r1=text&tr1=1.27&r2=text&tr2=1.26&f=h attr/libmisc/Makefile - 1.4 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/attr/libmisc/Makefile.diff?r1=text&tr1=1.4&r2=text&tr2=1.3&f=h From owner-xfs@oss.sgi.com Tue Nov 20 19:48:07 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 20 Nov 2007 19:48:15 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAL3m0Lg016108 for ; Tue, 20 Nov 2007 19:48:06 -0800 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA11482; Wed, 21 Nov 2007 14:48:04 +1100 Received: by chook.melbourne.sgi.com (Postfix, from userid 1116) id 61B8D58C4C0A; Wed, 21 Nov 2007 14:48:04 +1100 (EST) To: xfs@oss.sgi.com, sgi.bugs.xfs@engr.sgi.com Subject: PARTIAL TAKE 973591 - fix up tree walking code with symlinks for acl commands Message-Id: <20071121034804.61B8D58C4C0A@chook.melbourne.sgi.com> Date: Wed, 21 Nov 2007 14:48:04 +1100 (EST) From: tes@sgi.com (Tim Shimmin) X-Virus-Scanned: ClamAV 0.91.2/4861/Tue Nov 20 18:20:00 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13713 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: tes@sgi.com Precedence: bulk X-list: xfs fix up tree walking code with symlinks for acl commands Checking in Andreas' patches. Date: Wed Nov 21 14:47:09 AEDT 2007 Workarea: chook.melbourne.sgi.com:/build/tes/xfs-cmds Inspected by: agruen@suse.de The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/xfs-cmds/master-melb Modid: master-melb:xfs-cmds:30109a acl/libmisc/walk_tree.c - 1.1 - new http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/acl/libmisc/walk_tree.c acl/include/walk_tree.h - 1.1 - new http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/acl/include/walk_tree.h acl/VERSION - 1.86 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/acl/VERSION.diff?r1=text&tr1=1.86&r2=text&tr2=1.85&f=h acl/doc/CHANGES - 1.97 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/acl/doc/CHANGES.diff?r1=text&tr1=1.97&r2=text&tr2=1.96&f=h acl/setfacl/setfacl.c - 1.21 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/acl/setfacl/setfacl.c.diff?r1=text&tr1=1.21&r2=text&tr2=1.20&f=h acl/setfacl/do_set.c - 1.15 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/acl/setfacl/do_set.c.diff?r1=text&tr1=1.15&r2=text&tr2=1.14&f=h acl/setfacl/Makefile - 1.12 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/acl/setfacl/Makefile.diff?r1=text&tr1=1.12&r2=text&tr2=1.11&f=h acl/getfacl/getfacl.c - 1.21 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/acl/getfacl/getfacl.c.diff?r1=text&tr1=1.21&r2=text&tr2=1.20&f=h acl/getfacl/Makefile - 1.12 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/acl/getfacl/Makefile.diff?r1=text&tr1=1.12&r2=text&tr2=1.11&f=h acl/libmisc/Makefile - 1.5 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/acl/libmisc/Makefile.diff?r1=text&tr1=1.5&r2=text&tr2=1.4&f=h From owner-xfs@oss.sgi.com Tue Nov 20 19:54:15 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 20 Nov 2007 19:54:22 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAL3s7Jv017022 for ; Tue, 20 Nov 2007 19:54:14 -0800 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA11584; Wed, 21 Nov 2007 14:54:11 +1100 Received: by chook.melbourne.sgi.com (Postfix, from userid 1116) id 6815B58C4C0A; Wed, 21 Nov 2007 14:54:11 +1100 (EST) To: xfs@oss.sgi.com, sgi.bugs.xfs@engr.sgi.com Subject: PARTIAL TAKE 973591 970324 - fix up tree walking with symlinks for attr Message-Id: <20071121035411.6815B58C4C0A@chook.melbourne.sgi.com> Date: Wed, 21 Nov 2007 14:54:11 +1100 (EST) From: tes@sgi.com (Tim Shimmin) X-Virus-Scanned: ClamAV 0.91.2/4861/Tue Nov 20 18:20:00 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13714 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: tes@sgi.com Precedence: bulk X-list: xfs fix up tree walking with symlinks for attr Date: Wed Nov 21 14:53:02 AEDT 2007 Workarea: chook.melbourne.sgi.com:/build/tes/xfs-cmds Inspected by: agruen@suse.de The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/xfs-cmds/master-melb Modid: master-melb:xfs-cmds:30110a xfstests/062.out - 1.12 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/xfstests/062.out.diff?r1=text&tr1=1.12&r2=text&tr2=1.11&f=h - Update the output now that getfattr code has changed with new tree walking code. From owner-xfs@oss.sgi.com Tue Nov 20 20:07:32 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 20 Nov 2007 20:07:37 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAL47ScK018806 for ; Tue, 20 Nov 2007 20:07:30 -0800 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA12024; Wed, 21 Nov 2007 15:07:31 +1100 Received: by chook.melbourne.sgi.com (Postfix, from userid 1116) id 40C2258C4C0A; Wed, 21 Nov 2007 15:07:31 +1100 (EST) To: xfs@oss.sgi.com, sgi.bugs.xfs@engr.sgi.com Subject: PARTIAL TAKE 907752 - remove outdated attr ea-conv script Message-Id: <20071121040731.40C2258C4C0A@chook.melbourne.sgi.com> Date: Wed, 21 Nov 2007 15:07:31 +1100 (EST) From: tes@sgi.com (Tim Shimmin) X-Virus-Scanned: ClamAV 0.91.2/4861/Tue Nov 20 18:20:00 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13715 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: tes@sgi.com Precedence: bulk X-list: xfs Remove outdated conversion script ea-conv. Suggested by Andreas Gruenbacher. Date: Wed Nov 21 15:06:41 AEDT 2007 Workarea: chook.melbourne.sgi.com:/build/tes/xfs-cmds Inspected by: agruen@suse.de The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/xfs-cmds/master-melb Modid: master-melb:xfs-cmds:30111a attr/doc/CHANGES - 1.84 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/attr/doc/CHANGES.diff?r1=text&tr1=1.84&r2=text&tr2=1.83&f=h attr/doc/Makefile - 1.10 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/attr/doc/Makefile.diff?r1=text&tr1=1.10&r2=text&tr2=1.9&f=h attr/doc/ea-conv/Makefile - 1.6 - deleted http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/attr/doc/ea-conv/Makefile.diff?r1=text&tr1=1.6&r2=text&tr2=1.5&f=h attr/doc/ea-conv/ea-conv - 1.2 - deleted http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/attr/doc/ea-conv/ea-conv.diff?r1=text&tr1=1.2&r2=text&tr2=1.1&f=h attr/doc/ea-conv/README - 1.2 - deleted http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/attr/doc/ea-conv/README.diff?r1=text&tr1=1.2&r2=text&tr2=1.1&f=h - Remove outdated conversion script ea-conv. From owner-xfs@oss.sgi.com Wed Nov 21 07:20:12 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 07:20:19 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lALFJwge007767 for ; Wed, 21 Nov 2007 07:20:12 -0800 Received: from hch by pentafluge.infradead.org with local (Exim 4.63 #1 (Red Hat Linux)) id 1IurN0-0002TF-Ga; Wed, 21 Nov 2007 15:20:06 +0000 Date: Wed, 21 Nov 2007 15:20:06 +0000 From: Christoph Hellwig To: David Chinner Cc: xfs-dev , xfs-oss Subject: Re: [PATCH,RFC] Factor some btree code.... Message-ID: <20071121152006.GD8454@infradead.org> References: <20071106091836.GV995458@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071106091836.GV995458@sgi.com> User-Agent: Mutt/1.4.2.3i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-Virus-Scanned: ClamAV 0.91.2/4872/Wed Nov 21 00:36:49 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13717 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs I like this. But I think a single set of xfs_btree_ops would be a lot more readable. Also I'm not sure we actually need all these ops, e.g. instead of .buf_to_block we could always just call XFS_BUF_PTR directly. From owner-xfs@oss.sgi.com Wed Nov 21 07:17:43 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 07:17:54 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lALFHd0P007356 for ; Wed, 21 Nov 2007 07:17:43 -0800 Received: from hch by pentafluge.infradead.org with local (Exim 4.63 #1 (Red Hat Linux)) id 1IurKl-0002QW-2r; Wed, 21 Nov 2007 15:17:47 +0000 Date: Wed, 21 Nov 2007 15:17:47 +0000 From: Christoph Hellwig To: Lachlan McIlroy Cc: David Chinner , xfs-dev , xfs-oss Subject: Re: [PATCH] bulkstat fixups Message-ID: <20071121151747.GC8454@infradead.org> References: <4733EEF2.9010504@sgi.com> <20071111214759.GS995458@sgi.com> <4737C11D.8030007@sgi.com> <20071112041121.GT66820511@sgi.com> <473D1DE0.1090106@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <473D1DE0.1090106@sgi.com> User-Agent: Mutt/1.4.2.3i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-Virus-Scanned: ClamAV 0.91.2/4872/Wed Nov 21 00:36:49 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13716 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs +#define XFS_BULKSTAT_UBLEFT(ubleft) ((ubleft) >= statstruct_size) + I don't think this macro is really helpful. An inline would have been useful if statstruct_size was constant, but this way it's much better to just write out the comparism the four times it's used. + if (!ubcountp || *ubcountp <= 0) { + return EINVAL; + } No need for the braces here. I also must say I don't like the cond_resched() calls very much. They look entirely random to me. We really should only need cond_resched when it's absolutely needed, and it deserves a comment why it's needed then. From owner-xfs@oss.sgi.com Wed Nov 21 07:21:49 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 07:22:00 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lALFLkfX008158 for ; Wed, 21 Nov 2007 07:21:48 -0800 Received: from hch by pentafluge.infradead.org with local (Exim 4.63 #1 (Red Hat Linux)) id 1IurOj-0002Uh-HG; Wed, 21 Nov 2007 15:21:53 +0000 Date: Wed, 21 Nov 2007 15:21:53 +0000 From: Christoph Hellwig To: Lachlan McIlroy Cc: Christoph Hellwig , xfs-dev , xfs-oss Subject: Re: [PATCH] Turn off XBF_READ_AHEAD in io completion Message-ID: <20071121152153.GE8454@infradead.org> References: <47296FF7.8080607@sgi.com> <20071101100012.GA20065@infradead.org> <4733B1CA.9030109@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4733B1CA.9030109@sgi.com> User-Agent: Mutt/1.4.2.3i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-Virus-Scanned: ClamAV 0.91.2/4872/Wed Nov 21 00:36:49 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13718 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs On Fri, Nov 09, 2007 at 12:03:06PM +1100, Lachlan McIlroy wrote: > Okay, I've done that (new patch attached). It's certainly not as > clean as the last patch. Yes, it doesn't really look like an improvement. I'd go with your previous patch and see if we can find a better way to sort out the buffer flags mess later. From owner-xfs@oss.sgi.com Wed Nov 21 07:22:08 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 07:22:18 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_13 autolearn=no version=3.3.0-r574664 Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lALFLs6C008190 for ; Wed, 21 Nov 2007 07:22:07 -0800 Received: from hch by pentafluge.infradead.org with local (Exim 4.63 #1 (Red Hat Linux)) id 1Iur8o-0002G7-7H; Wed, 21 Nov 2007 15:05:26 +0000 Date: Wed, 21 Nov 2007 15:05:26 +0000 From: Christoph Hellwig To: Barry Naujok Cc: "xfs@oss.sgi.com" , xfs-dev Subject: Re: [REVIEW] Refactor xfs_repair's process_dinode_int Message-ID: <20071121150526.GA8454@infradead.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-Virus-Scanned: ClamAV 0.91.2/4872/Wed Nov 21 00:36:49 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13719 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs On Thu, Nov 15, 2007 at 05:40:41PM +1100, Barry Naujok wrote: > Implementing casefold-table checking in xfs_repair, I have to > touch process_dinode_int. It's a horrendous function. The attached > patch hopefully makes it much clearer what it does and removes a > lot of duplicate code when bad inodes are found. There are some > obscure bug fixes too (eg. two places where the inode's di_mode is > updated, but not marked dirty - libxfs would have tossed it). > > The refactoring involved removing unused variables, working out > what various variables actually did and use them appropriately > and break blocks of functionality into separate functions. This looks very good f4rom a quick glance over it. I can't claim I've verified that it still does the same thing, though. From owner-xfs@oss.sgi.com Wed Nov 21 07:36:55 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 07:37:05 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lALFasZE011719 for ; Wed, 21 Nov 2007 07:36:55 -0800 Received: from hch by pentafluge.infradead.org with local (Exim 4.63 #1 (Red Hat Linux)) id 1IurB5-0002I0-0Z; Wed, 21 Nov 2007 15:07:47 +0000 Date: Wed, 21 Nov 2007 15:07:46 +0000 From: Christoph Hellwig To: "J. Bruce Fields" Cc: Christoph Hellwig , Chris Wedgwood , linux-xfs@oss.sgi.com, LKML Subject: Re: 2.6.24-rc2 XFS nfsd hang Message-ID: <20071121150746.GB8454@infradead.org> References: <20071114070400.GA25708@puku.stupidest.org> <20071114152952.GA4210@infradead.org> <20071114173922.GC14254@fieldses.org> <20071114174419.GA15271@infradead.org> <20071114175322.GD14254@fieldses.org> <20071114180241.GA16656@infradead.org> <20071114180838.GE14254@fieldses.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071114180838.GE14254@fieldses.org> User-Agent: Mutt/1.4.2.3i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-Virus-Scanned: ClamAV 0.91.2/4872/Wed Nov 21 00:36:49 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13720 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs On Wed, Nov 14, 2007 at 01:08:38PM -0500, J. Bruce Fields wrote: > > Personally I'd prefer it to only grow a struct stat or rather it's members > > But the nfsd code currently expects a dentry so this might require some > > major refactoring. > > Well, we need to check for mountpoints, for example, so I don't see any > way out of needing a dentry. What's the drawback? You're right - we'd probably need the dentry. The drawback is that we need to always get it in the dcache. Which might be a good thing depending on the workload. From owner-xfs@oss.sgi.com Wed Nov 21 08:08:48 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 08:09:07 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-0.2 required=5.0 tests=BAYES_40 autolearn=ham version=3.3.0-r574664 Received: from c3po.klbg.n.redcross.or.at (c3po.klbg.n.redcross.or.at [195.202.144.145]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lALG8db9016638 for ; Wed, 21 Nov 2007 08:08:48 -0800 Received: from localhost (localhost [127.0.0.1]) by c3po.klbg.n.redcross.or.at (Postfix) with ESMTP id C590F90C; Wed, 21 Nov 2007 16:39:29 +0100 (CET) Received: by c3po.klbg.n.redcross.or.at (Postfix, from userid 1020) id ED05E8C8; Wed, 21 Nov 2007 16:39:20 +0100 (CET) Received: from omnibach.wu-wien.ac.at (omnibach.intern.klbg.n.roteskreuz.at [192.168.60.150]) by c3po.klbg.n.redcross.or.at (Postfix) with ESMTP id B817F329; Wed, 21 Nov 2007 16:39:19 +0100 (CET) Message-ID: <47445130.5070103@strike.wu-wien.ac.at> Date: Wed, 21 Nov 2007 16:39:28 +0100 From: "Alexander 'Leo' Bergolth" User-Agent: Thunderbird 2.0.0.5 (X11/20070727) MIME-Version: 1.0 To: Chris Wedgwood Cc: Emmanuel Florac , Eric Sandeen , xfs@oss.sgi.com Subject: Re: XFS crash on linux raid References: <20070503164521.16efe075@harpe.intellique.com> <20070504005922.GC32602149@melbourne.sgi.com> <20070504090613.7c0f97d3@galadriel.home> <20070504073344.GL32602149@melbourne.sgi.com> <20070504152546.614374ac@harpe.intellique.com> <463B4962.70904@sandeen.net> <20070504173049.14606033@harpe.intellique.com> <20070504232028.GA19744@tuatara.stupidest.org> <4741D198.2060908@strike.wu-wien.ac.at> <20071119234415.GB29042@puku.stupidest.org> In-Reply-To: <20071119234415.GB29042@puku.stupidest.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4872/Wed Nov 21 00:36:49 2007 on oss.sgi.com X-Virus-Scanned: by amavisd-new at klbg.n.redcross.or.at X-Virus-Status: Clean X-archive-position: 13721 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: leo@strike.wu-wien.ac.at Precedence: bulk X-list: xfs On 11/20/2007 12:44 AM, Chris Wedgwood wrote: > On Mon, Nov 19, 2007 at 07:10:32PM +0100, Alexander Bergolth wrote: >> Several months ago, you posted a patch for 8k stacks together with >> irqstacks: > > I'll take a look. Can you make sure it works with 8K stacks (more or > less works). It works (more or less) fine without the patch that splits irqstacks from 4k stacks. (I tried it both with CONFIG_4KSTACKS set and unset.) Thanks, --leo -- e-mail ::: Alexander.Bergolth (at) wu-wien.ac.at fax ::: +43-1-31336-906050 location ::: Computer Center | Vienna University of Economics | Austria From owner-xfs@oss.sgi.com Wed Nov 21 11:03:52 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 11:04:10 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from fieldses.org (mail.fieldses.org [66.93.2.214]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lALJ3m07013520 for ; Wed, 21 Nov 2007 11:03:52 -0800 Received: from bfields by fieldses.org with local (Exim 4.68) (envelope-from ) id 1IuurW-00035d-It; Wed, 21 Nov 2007 14:03:50 -0500 Date: Wed, 21 Nov 2007 14:03:50 -0500 To: Christoph Hellwig Cc: Chris Wedgwood , linux-xfs@oss.sgi.com, LKML Subject: Re: 2.6.24-rc2 XFS nfsd hang Message-ID: <20071121190350.GA28029@fieldses.org> References: <20071114070400.GA25708@puku.stupidest.org> <20071114152952.GA4210@infradead.org> <20071114173922.GC14254@fieldses.org> <20071114174419.GA15271@infradead.org> <20071114175322.GD14254@fieldses.org> <20071114180241.GA16656@infradead.org> <20071114180838.GE14254@fieldses.org> <20071121150746.GB8454@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071121150746.GB8454@infradead.org> User-Agent: Mutt/1.5.17 (2007-11-01) From: "J. Bruce Fields" X-Virus-Scanned: ClamAV 0.91.2/4872/Wed Nov 21 00:36:49 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13722 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bfields@fieldses.org Precedence: bulk X-list: xfs On Wed, Nov 21, 2007 at 03:07:46PM +0000, Christoph Hellwig wrote: > On Wed, Nov 14, 2007 at 01:08:38PM -0500, J. Bruce Fields wrote: > > > Personally I'd prefer it to only grow a struct stat or rather it's members > > > But the nfsd code currently expects a dentry so this might require some > > > major refactoring. > > > > Well, we need to check for mountpoints, for example, so I don't see any > > way out of needing a dentry. What's the drawback? > > You're right - we'd probably need the dentry. The drawback is that > we need to always get it in the dcache. Which might be a good thing > depending on the workload. In any case, if the new api were only used by nfsd for now, then there'd be no change here. Seems like it might be worth a try. --b. From owner-xfs@oss.sgi.com Wed Nov 21 13:31:15 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 13:31:19 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lALLV8ZA014131 for ; Wed, 21 Nov 2007 13:31:14 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id IAA09561; Thu, 22 Nov 2007 08:31:12 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lALLVBdD115516814; Thu, 22 Nov 2007 08:31:12 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lALLVBg3115206848; Thu, 22 Nov 2007 08:31:11 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 22 Nov 2007 08:31:11 +1100 From: David Chinner To: Christoph Hellwig Cc: Lachlan McIlroy , David Chinner , xfs-dev , xfs-oss Subject: Re: [PATCH] bulkstat fixups Message-ID: <20071121213110.GD114266761@sgi.com> References: <4733EEF2.9010504@sgi.com> <20071111214759.GS995458@sgi.com> <4737C11D.8030007@sgi.com> <20071112041121.GT66820511@sgi.com> <473D1DE0.1090106@sgi.com> <20071121151747.GC8454@infradead.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071121151747.GC8454@infradead.org> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4874/Wed Nov 21 12:33:19 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13723 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Wed, Nov 21, 2007 at 03:17:47PM +0000, Christoph Hellwig wrote: > +#define XFS_BULKSTAT_UBLEFT(ubleft) ((ubleft) >= statstruct_size) > + > > I don't think this macro is really helpful. An inline would > have been useful if statstruct_size was constant, but this > way it's much better to just write out the comparism the four > times it's used. > > + if (!ubcountp || *ubcountp <= 0) { > + return EINVAL; > + } > > No need for the braces here. > > > I also must say I don't like the cond_resched() calls very much. They > look entirely random to me. We really should only need cond_resched > when it's absolutely needed, and it deserves a comment why it's needed > then. I think I mentioned that we don't need them in the innermost loop. The reason for adding them is that if the inode clusters are in cache, bulkstat will not yield the cpu at all and so holds off other things from operating on that CPU. And when bulkstat has got itself stuck in a loop, if it's running on the same CPU as I/O completion events are running on (i.e. disk interrupts delivered to) it basically hangs the filesystem. If that CPU is taking interrupts for your root filesystem.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Wed Nov 21 13:42:11 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 13:42:14 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lALLg5lb016305 for ; Wed, 21 Nov 2007 13:42:09 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id IAA09717; Thu, 22 Nov 2007 08:42:10 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lALLg9dD115502352; Thu, 22 Nov 2007 08:42:09 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lALLg8o2114761623; Thu, 22 Nov 2007 08:42:08 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 22 Nov 2007 08:42:08 +1100 From: David Chinner To: Christoph Hellwig Cc: David Chinner , xfs-dev , xfs-oss Subject: Re: [PATCH,RFC] Factor some btree code.... Message-ID: <20071121214208.GE114266761@sgi.com> References: <20071106091836.GV995458@sgi.com> <20071121152006.GD8454@infradead.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071121152006.GD8454@infradead.org> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4874/Wed Nov 21 12:33:19 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13724 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Wed, Nov 21, 2007 at 03:20:06PM +0000, Christoph Hellwig wrote: > > I like this. But I think a single set of xfs_btree_ops would be a lot > more readable. Also I'm not sure we actually need all these ops, e.g. > instead of .buf_to_block we could always just call XFS_BUF_PTR directly. I've already converted it to a single ops structure and cut out quite a few ops that could be generic (sibling ops, checking ops, numrecs ops, etc). I opted for moving them all over initially and now I'm culling the common stuff. .buf_to_block is another one that can go, .buf_to_ptr can be done generically, too.... I'm still flshing bugs out, so i haven't posted an update. When I get it to pass xfsqa I'll send out an updated patchset. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Wed Nov 21 16:31:04 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 16:31:07 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_23 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAM0UwIQ010129 for ; Wed, 21 Nov 2007 16:31:03 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA13927; Thu, 22 Nov 2007 11:31:04 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAM0V2dD115535635; Thu, 22 Nov 2007 11:31:03 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAM0V0jG115565281; Thu, 22 Nov 2007 11:31:00 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 22 Nov 2007 11:31:00 +1100 From: David Chinner To: xfs-oss Cc: lkml Subject: [PATCH 0/9]: Various XFS inode clustering improvements Message-ID: <20071122003100.GF114266761@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4874/Wed Nov 21 12:33:19 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13725 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Normally I wouldn't bother cc'ing lkml on XFS changes, however a couple of these patches touch generic code. The changes to generic code are introducing a WRITE_META bio type and radix_tree_gang_lookup_range() and hence the wider ditribution. This patch set is against the current xfs-dev tree so bits of it may not apply to current mainline. Overall, the patch set is focussed on improving the XFS inode cache and clustering code. It reduces memory usage of the cache by 5-10% and improves performance on some workloads by 10-15%. Comments welcome. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Wed Nov 21 16:32:14 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 16:32:18 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAM0W7s4010383 for ; Wed, 21 Nov 2007 16:32:12 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA13981; Thu, 22 Nov 2007 11:32:14 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAM0WDdD115432572; Thu, 22 Nov 2007 11:32:13 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAM0WCiD111491649; Thu, 22 Nov 2007 11:32:12 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 22 Nov 2007 11:32:11 +1100 From: David Chinner To: xfs-oss Cc: lkml Subject: [PATCH 1/9]: introduce radix_tree_gang_lookup_range Message-ID: <20071122003211.GG114266761@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4874/Wed Nov 21 12:33:19 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13726 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Introduce radix_tree_gang_lookup_range() The inode clustering in XFS requires a gang lookup on the radix tree to find all the inodes in the cluster. The gang lookup has to set the maximum items to that of a fully populated cluster so we get all the inodes in the cluster, but we only populate the radix tree sparsely (on demand). As a result, the gang lookup can search way, way past the index of end of the cluster because it is looking for a fixed number of entries to return. We know we want to terminate the search at either a specific index or a maximum number of items, so we need to add a "last_index" parameter to the lookup. Furthermore, the existing radix_tree_gang_lookup() can use this same function if we define a RADIX_TREE_MAX_INDEX value so the search is not limited by the last_index. Signed-off-by: Dave Chinner --- include/linux/radix-tree.h | 7 ++++- lib/radix-tree.c | 55 ++++++++++++++++++++++++++++++++++++--------- 2 files changed, 51 insertions(+), 11 deletions(-) Index: 2.6.x-xfs-new/include/linux/radix-tree.h =================================================================== --- 2.6.x-xfs-new.orig/include/linux/radix-tree.h 2007-11-22 10:25:23.834502553 +1100 +++ 2.6.x-xfs-new/include/linux/radix-tree.h 2007-11-22 10:31:46.689597763 +1100 @@ -98,10 +98,11 @@ do { \ * radix_tree_lookup * radix_tree_tag_get * radix_tree_gang_lookup + * radix_tree_gang_lookup_range * radix_tree_gang_lookup_tag * radix_tree_tagged * - * The first 4 functions are able to be called locklessly, using RCU. The + * The first 5 functions are able to be called locklessly, using RCU. The * caller must ensure calls to these functions are made within rcu_read_lock() * regions. Other readers (lock-free or otherwise) and modifications may be * running concurrently. @@ -155,6 +156,10 @@ void *radix_tree_delete(struct radix_tre unsigned int radix_tree_gang_lookup(struct radix_tree_root *root, void **results, unsigned long first_index, unsigned int max_items); +unsigned int +radix_tree_gang_lookup_range(struct radix_tree_root *root, void **results, + unsigned long first_index, unsigned long last_index, + unsigned int max_items); int radix_tree_preload(gfp_t gfp_mask); void radix_tree_init(void); void *radix_tree_tag_set(struct radix_tree_root *root, Index: 2.6.x-xfs-new/lib/radix-tree.c =================================================================== --- 2.6.x-xfs-new.orig/lib/radix-tree.c 2007-11-22 10:31:24.564425190 +1100 +++ 2.6.x-xfs-new/lib/radix-tree.c 2007-11-22 10:31:46.693597252 +1100 @@ -62,6 +62,8 @@ struct radix_tree_path { #define RADIX_TREE_INDEX_BITS (8 /* CHAR_BIT */ * sizeof(unsigned long)) #define RADIX_TREE_MAX_PATH (RADIX_TREE_INDEX_BITS/RADIX_TREE_MAP_SHIFT + 2) +#define RADIX_TREE_MAX_KEY ~0UL + static unsigned long height_to_maxindex[RADIX_TREE_MAX_PATH] __read_mostly; /* @@ -599,7 +601,8 @@ EXPORT_SYMBOL(radix_tree_tag_get); static unsigned int __lookup(struct radix_tree_node *slot, void **results, unsigned long index, - unsigned int max_items, unsigned long *next_index) + unsigned long last_index, unsigned int max_items, + unsigned long *next_index) { unsigned int nr_found = 0; unsigned int shift, height; @@ -640,6 +643,8 @@ __lookup(struct radix_tree_node *slot, v if (nr_found == max_items) goto out; } + if (index > last_index) + goto out; } out: *next_index = index; @@ -647,27 +652,29 @@ out: } /** - * radix_tree_gang_lookup - perform multiple lookup on a radix tree + * radix_tree_gang_lookup_range - perform multiple lookup on a radix tree * @root: radix tree root * @results: where the results of the lookup are placed * @first_index: start the lookup from this key + * @last_index: end the lookup at this key * @max_items: place up to this many items at *results * - * Performs an index-ascending scan of the tree for present items. Places - * them at *@results and returns the number of items which were placed at - * *@results. + * Performs an index-ascending scan of the tree for present items up to + * @last_index in the tree. Places them at *@results and returns the + * number of items which were placed at *@results. * * The implementation is naive. * - * Like radix_tree_lookup, radix_tree_gang_lookup may be called under + * Like radix_tree_lookup, radix_tree_gang_lookup_range may be called under * rcu_read_lock. In this case, rather than the returned results being * an atomic snapshot of the tree at a single point in time, the semantics * of an RCU protected gang lookup are as though multiple radix_tree_lookups * have been issued in individual locks, and results stored in 'results'. */ unsigned int -radix_tree_gang_lookup(struct radix_tree_root *root, void **results, - unsigned long first_index, unsigned int max_items) +radix_tree_gang_lookup_range(struct radix_tree_root *root, void **results, + unsigned long first_index, unsigned long last_index, + unsigned int max_items) { unsigned long max_index; struct radix_tree_node *node; @@ -686,7 +693,7 @@ radix_tree_gang_lookup(struct radix_tree return 1; } - max_index = radix_tree_maxindex(node->height); + max_index = min(last_index, radix_tree_maxindex(node->height)); ret = 0; while (ret < max_items) { @@ -695,7 +702,7 @@ radix_tree_gang_lookup(struct radix_tree if (cur_index > max_index) break; - nr_found = __lookup(node, results + ret, cur_index, + nr_found = __lookup(node, results + ret, cur_index, max_index, max_items - ret, &next_index); ret += nr_found; if (next_index == 0) @@ -705,6 +712,34 @@ radix_tree_gang_lookup(struct radix_tree return ret; } +EXPORT_SYMBOL(radix_tree_gang_lookup_range); + +/** + * radix_tree_gang_lookup - perform multiple lookup on a radix tree + * @root: radix tree root + * @results: where the results of the lookup are placed + * @first_index: start the lookup from this key + * @max_items: place up to this many items at *results + * + * Performs an index-ascending scan of the tree for present items. Places + * them at *@results and returns the number of items which were placed at + * *@results. + * + * The implementation is naive. + * + * Like radix_tree_lookup, radix_tree_gang_lookup may be called under + * rcu_read_lock. In this case, rather than the returned results being + * an atomic snapshot of the tree at a single point in time, the semantics + * of an RCU protected gang lookup are as though multiple radix_tree_lookups + * have been issued in individual locks, and results stored in 'results'. + */ +unsigned int +radix_tree_gang_lookup(struct radix_tree_root *root, void **results, + unsigned long first_index, unsigned int max_items) +{ + return radix_tree_gang_lookup_range(root, results, first_index, + RADIX_TREE_MAX_KEY, max_items); +} EXPORT_SYMBOL(radix_tree_gang_lookup); /* From owner-xfs@oss.sgi.com Wed Nov 21 16:33:39 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 16:33:44 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAM0XaOg010977 for ; Wed, 21 Nov 2007 16:33:38 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA14088; Thu, 22 Nov 2007 11:33:42 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAM0XfdD113926212; Thu, 22 Nov 2007 11:33:41 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAM0XdHd115288164; Thu, 22 Nov 2007 11:33:39 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 22 Nov 2007 11:33:39 +1100 From: David Chinner To: xfs-oss Cc: lkml Subject: [PATCH 2/9]: Reduce Log I/O latency Message-ID: <20071122003339.GH114266761@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4874/Wed Nov 21 12:33:19 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13727 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Reduce log I/O latency To ensure that log I/O is issued as the highest priority I/O, set the I/O priority of the log I/O to the highest possible. This will ensure that log I/O is not held up behind bulk data or other metadata I/O as delaying log I/O can pause the entire transaction subsystem. Introduce a new buffer flag to allow us to tag the log buffers so we can discrimiate when issuing the I/O. Signed-off-by: Dave Chinner --- fs/xfs/linux-2.6/xfs_buf.c | 3 +++ fs/xfs/linux-2.6/xfs_buf.h | 5 ++++- fs/xfs/xfs_log.c | 2 ++ 3 files changed, 9 insertions(+), 1 deletion(-) Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_buf.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_buf.c 2007-11-22 10:47:21.937396362 +1100 +++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_buf.c 2007-11-22 10:53:11.556186722 +1100 @@ -1255,6 +1255,9 @@ next_chunk: submit_io: if (likely(bio->bi_size)) { + /* log I/O should not be delayed by anything. */ + if (bp->b_flags & XBF_LOG_BUFFER) + bio_set_prio(bio, IOPRIO_PRIO_VALUE(IOPRIO_CLASS_RT, 0)); submit_bio(rw, bio); if (size) goto next_chunk; Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_buf.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_buf.h 2007-11-22 10:47:21.945395328 +1100 +++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_buf.h 2007-11-22 10:53:11.556186722 +1100 @@ -53,7 +53,8 @@ typedef enum { XBF_DELWRI = (1 << 6), /* buffer has dirty pages */ XBF_STALE = (1 << 7), /* buffer has been staled, do not find it */ XBF_FS_MANAGED = (1 << 8), /* filesystem controls freeing memory */ - XBF_ORDERED = (1 << 11), /* use ordered writes */ + XBF_LOG_BUFFER = (1 << 9), /* Buffer issued by the log */ + XBF_ORDERED = (1 << 11), /* use ordered writes */ XBF_READ_AHEAD = (1 << 12), /* asynchronous read-ahead */ /* flags used only as arguments to access routines */ @@ -340,6 +341,8 @@ extern void xfs_buf_trace(xfs_buf_t *, c #define XFS_BUF_TARGET(bp) ((bp)->b_target) #define XFS_BUFTARG_NAME(target) xfs_buf_target_name(target) +#define XFS_BUF_SET_LOGBUF(bp) ((bp)->b_flags |= XBF_LOG_BUFFER) + static inline int xfs_bawrite(void *mp, xfs_buf_t *bp) { bp->b_fspriv3 = mp; Index: 2.6.x-xfs-new/fs/xfs/xfs_log.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_log.c 2007-11-22 10:47:21.945395328 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_log.c 2007-11-22 10:53:11.556186722 +1100 @@ -1443,6 +1443,8 @@ xlog_sync(xlog_t *log, XFS_BUF_ZEROFLAGS(bp); XFS_BUF_BUSY(bp); XFS_BUF_ASYNC(bp); + XFS_BUF_SET_LOGBUF(bp); + /* * Do an ordered write for the log block. * Its unnecessary to flush the first split block in the log wrap case. From owner-xfs@oss.sgi.com Wed Nov 21 16:35:11 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 16:35:15 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAM0Z8oh011661 for ; Wed, 21 Nov 2007 16:35:10 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA14165; Thu, 22 Nov 2007 11:35:16 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAM0ZEdD115317234; Thu, 22 Nov 2007 11:35:14 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAM0ZC6h114930013; Thu, 22 Nov 2007 11:35:12 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 22 Nov 2007 11:35:12 +1100 From: David Chinner To: xfs-oss Cc: lkml Subject: [PATCH 3/9] Use _META bio I/O types for metadata I/O Message-ID: <20071122003512.GI114266761@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4874/Wed Nov 21 12:33:19 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13728 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Improve metadata I/O merging in the elevator Change all async metadata buffers to use [READ|WRITE]_META I/O types so that the I/O doesn't get issued immediately. This allows merging of adjacent metadata requests but still prioritises them over bulk data. This shows a 10-15% improvement in sequential create speed of small files. Don't include the log buffers in this classification - leave them as sync types so they are issued immediately. Signed-off-by: Dave Chinner --- fs/xfs/linux-2.6/xfs_buf.c | 6 +++++- include/linux/fs.h | 1 + 2 files changed, 6 insertions(+), 1 deletion(-) Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_buf.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_buf.c 2007-11-22 10:53:11.556186722 +1100 +++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_buf.c 2007-11-22 10:53:43.748024392 +1100 @@ -1175,10 +1175,14 @@ _xfs_buf_ioapply( if (bp->b_flags & XBF_ORDERED) { ASSERT(!(bp->b_flags & XBF_READ)); rw = WRITE_BARRIER; - } else if (bp->b_flags & _XBF_RUN_QUEUES) { + } else if (bp->b_flags & XBF_LOG_BUFFER) { ASSERT(!(bp->b_flags & XBF_READ_AHEAD)); bp->b_flags &= ~_XBF_RUN_QUEUES; rw = (bp->b_flags & XBF_WRITE) ? WRITE_SYNC : READ_SYNC; + } else if (bp->b_flags & _XBF_RUN_QUEUES) { + ASSERT(!(bp->b_flags & XBF_READ_AHEAD)); + bp->b_flags &= ~_XBF_RUN_QUEUES; + rw = (bp->b_flags & XBF_WRITE) ? WRITE_META : READ_META; } else { rw = (bp->b_flags & XBF_WRITE) ? WRITE : (bp->b_flags & XBF_READ_AHEAD) ? READA : READ; Index: 2.6.x-xfs-new/include/linux/fs.h =================================================================== --- 2.6.x-xfs-new.orig/include/linux/fs.h 2007-11-22 10:47:21.965392742 +1100 +++ 2.6.x-xfs-new/include/linux/fs.h 2007-11-22 10:53:43.748024392 +1100 @@ -83,6 +83,7 @@ extern int dir_notify_enable; #define READ_SYNC (READ | (1 << BIO_RW_SYNC)) #define READ_META (READ | (1 << BIO_RW_META)) #define WRITE_SYNC (WRITE | (1 << BIO_RW_SYNC)) +#define WRITE_META (WRITE | (1 << BIO_RW_META)) #define WRITE_BARRIER ((1 << BIO_RW) | (1 << BIO_RW_BARRIER)) #define SEL_IN 1 From owner-xfs@oss.sgi.com Wed Nov 21 16:36:42 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 16:36:45 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_63 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAM0acui012169 for ; Wed, 21 Nov 2007 16:36:40 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA14225; Thu, 22 Nov 2007 11:36:45 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAM0aidD115015455; Thu, 22 Nov 2007 11:36:44 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAM0agh2115533225; Thu, 22 Nov 2007 11:36:42 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 22 Nov 2007 11:36:42 +1100 From: David Chinner To: xfs-oss Cc: lkml Subject: [PATCH 4/9] Factor common inode cluster buffer lookup code Message-ID: <20071122003642.GJ114266761@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4874/Wed Nov 21 12:33:19 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13729 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Factor xfs_itobp() and xfs_inotobp(). The only difference between the functions is one passes an inode for the lookup, the other passes an inode number. However, they don't do the same validity checking or set all the same state on the buffer that is returned yet they should. Factor the functions into a common implementation. Signed-off-by: Dave Chinner --- fs/xfs/xfs_inode.c | 283 ++++++++++++++++++++++++----------------------------- 1 file changed, 129 insertions(+), 154 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/xfs_inode.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_inode.c 2007-11-22 10:31:44.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_inode.c 2007-11-22 10:33:43.014729931 +1100 @@ -124,6 +124,126 @@ xfs_inobp_check( #endif /* + * Simple wrapper for calling xfs_imap() that includes error + * and bounds checking + */ +STATIC int +xfs_ino_to_imap( + xfs_mount_t *mp, + xfs_trans_t *tp, + xfs_ino_t ino, + xfs_imap_t *imap, + uint imap_flags) +{ + int error; + + error = xfs_imap(mp, tp, ino, imap, imap_flags); + if (error) { + cmn_err(CE_WARN, "xfs_ino_to_imap: xfs_imap() returned an " + "error %d on %s. Returning error.", + error, mp->m_fsname); + return error; + } + + /* + * If the inode number maps to a block outside the bounds + * of the file system then return NULL rather than calling + * read_buf and panicing when we get an error from the + * driver. + */ + if ((imap->im_blkno + imap->im_len) > + XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)) { + xfs_fs_cmn_err(CE_ALERT, mp, "xfs_ino_to_imap: " + "(imap->im_blkno (0x%llx) + imap->im_len (0x%llx)) > " + " XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks) (0x%llx)", + (unsigned long long) imap->im_blkno, + (unsigned long long) imap->im_len, + XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)); + return XFS_ERROR(EINVAL); + } + return 0; +} + +/* + * Find the buffer associated with the given inode map + * We do basic validation checks on the buffer once it has been + * retrieved from disk. + */ +STATIC int +xfs_imap_to_bp( + xfs_mount_t *mp, + xfs_trans_t *tp, + xfs_imap_t *imap, + xfs_buf_t **bpp, + uint buf_flags, + uint imap_flags) +{ + int error; + int i; + int ni; + xfs_buf_t *bp; + + error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp, imap->im_blkno, + (int)imap->im_len, XFS_BUF_LOCK, &bp); + if (error) { + cmn_err(CE_WARN, "xfs_imap_to_bp: xfs_trans_read_buf()returned " + "an error %d on %s. Returning error.", + error, mp->m_fsname); + return error; + } + + /* + * Validate the magic number and version of every inode in the buffer + * (if DEBUG kernel) or the first inode in the buffer, otherwise. + */ +#ifdef DEBUG + ni = BBTOB(imap->im_len) >> mp->m_sb.sb_inodelog; +#else /* usual case */ + ni = 1; +#endif + + for (i = 0; i < ni; i++) { + int di_ok; + xfs_dinode_t *dip; + + dip = (xfs_dinode_t *)xfs_buf_offset(bp, + (i << mp->m_sb.sb_inodelog)); + di_ok = be16_to_cpu(dip->di_core.di_magic) == XFS_DINODE_MAGIC && + XFS_DINODE_GOOD_VERSION(dip->di_core.di_version); + if (unlikely(XFS_TEST_ERROR(!di_ok, mp, + XFS_ERRTAG_ITOBP_INOTOBP, + XFS_RANDOM_ITOBP_INOTOBP))) { + if (imap_flags & XFS_IMAP_BULKSTAT) { + xfs_trans_brelse(tp, bp); + return XFS_ERROR(EINVAL); + } + XFS_CORRUPTION_ERROR("xfs_imap_to_bp", + XFS_ERRLEVEL_HIGH, mp, dip); +#ifdef DEBUG + cmn_err(CE_PANIC, + "Device %s - bad inode magic/vsn " + "daddr %lld #%d (magic=%x)", + XFS_BUFTARG_NAME(mp->m_ddev_targp), + (unsigned long long)imap->im_blkno, i, + be16_to_cpu(dip->di_core.di_magic)); +#endif + xfs_trans_brelse(tp, bp); + return XFS_ERROR(EFSCORRUPTED); + } + } + + xfs_inobp_check(mp, bp); + + /* + * Mark the buffer as an inode buffer now that it looks good + */ + XFS_BUF_SET_VTYPE(bp, B_FS_INO); + + *bpp = bp; + return 0; +} + +/* * This routine is called to map an inode number within a file * system to the buffer containing the on-disk version of the * inode. It returns a pointer to the buffer containing the @@ -145,72 +265,19 @@ xfs_inotobp( xfs_buf_t **bpp, int *offset) { - int di_ok; xfs_imap_t imap; xfs_buf_t *bp; int error; - xfs_dinode_t *dip; - /* - * Call the space management code to find the location of the - * inode on disk. - */ imap.im_blkno = 0; - error = xfs_imap(mp, tp, ino, &imap, XFS_IMAP_LOOKUP); - if (error != 0) { - cmn_err(CE_WARN, - "xfs_inotobp: xfs_imap() returned an " - "error %d on %s. Returning error.", error, mp->m_fsname); + error = xfs_ino_to_imap(mp, tp, ino, &imap, XFS_IMAP_LOOKUP); + if (error) return error; - } - /* - * If the inode number maps to a block outside the bounds of the - * file system then return NULL rather than calling read_buf - * and panicing when we get an error from the driver. - */ - if ((imap.im_blkno + imap.im_len) > - XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)) { - cmn_err(CE_WARN, - "xfs_inotobp: inode number (%llu + %d) maps to a block outside the bounds " - "of the file system %s. Returning EINVAL.", - (unsigned long long)imap.im_blkno, - imap.im_len, mp->m_fsname); - return XFS_ERROR(EINVAL); - } - - /* - * Read in the buffer. If tp is NULL, xfs_trans_read_buf() will - * default to just a read_buf() call. - */ - error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp, imap.im_blkno, - (int)imap.im_len, XFS_BUF_LOCK, &bp); - - if (error) { - cmn_err(CE_WARN, - "xfs_inotobp: xfs_trans_read_buf() returned an " - "error %d on %s. Returning error.", error, mp->m_fsname); + error = xfs_imap_to_bp(mp, tp, &imap, &bp, XFS_BUF_LOCK, 0); + if (error) return error; - } - dip = (xfs_dinode_t *)xfs_buf_offset(bp, 0); - di_ok = - be16_to_cpu(dip->di_core.di_magic) == XFS_DINODE_MAGIC && - XFS_DINODE_GOOD_VERSION(dip->di_core.di_version); - if (unlikely(XFS_TEST_ERROR(!di_ok, mp, XFS_ERRTAG_ITOBP_INOTOBP, - XFS_RANDOM_ITOBP_INOTOBP))) { - XFS_CORRUPTION_ERROR("xfs_inotobp", XFS_ERRLEVEL_LOW, mp, dip); - xfs_trans_brelse(tp, bp); - cmn_err(CE_WARN, - "xfs_inotobp: XFS_TEST_ERROR() returned an " - "error on %s. Returning EFSCORRUPTED.", mp->m_fsname); - return XFS_ERROR(EFSCORRUPTED); - } - - xfs_inobp_check(mp, bp); - /* - * Set *dipp to point to the on-disk inode in the buffer. - */ *dipp = (xfs_dinode_t *)xfs_buf_offset(bp, imap.im_boffset); *bpp = bp; *offset = imap.im_boffset; @@ -251,41 +318,15 @@ xfs_itobp( xfs_imap_t imap; xfs_buf_t *bp; int error; - int i; - int ni; if (ip->i_blkno == (xfs_daddr_t)0) { - /* - * Call the space management code to find the location of the - * inode on disk. - */ imap.im_blkno = bno; - if ((error = xfs_imap(mp, tp, ip->i_ino, &imap, - XFS_IMAP_LOOKUP | imap_flags))) + error = xfs_ino_to_imap(mp, tp, ip->i_ino, &imap, + XFS_IMAP_LOOKUP | imap_flags); + if (error) return error; /* - * If the inode number maps to a block outside the bounds - * of the file system then return NULL rather than calling - * read_buf and panicing when we get an error from the - * driver. - */ - if ((imap.im_blkno + imap.im_len) > - XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)) { -#ifdef DEBUG - xfs_fs_cmn_err(CE_ALERT, mp, "xfs_itobp: " - "(imap.im_blkno (0x%llx) " - "+ imap.im_len (0x%llx)) > " - " XFS_FSB_TO_BB(mp, " - "mp->m_sb.sb_dblocks) (0x%llx)", - (unsigned long long) imap.im_blkno, - (unsigned long long) imap.im_len, - XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)); -#endif /* DEBUG */ - return XFS_ERROR(EINVAL); - } - - /* * Fill in the fields in the inode that will be used to * map the inode to its buffer from now on. */ @@ -303,76 +344,10 @@ xfs_itobp( } ASSERT(bno == 0 || bno == imap.im_blkno); - /* - * Read in the buffer. If tp is NULL, xfs_trans_read_buf() will - * default to just a read_buf() call. - */ - error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp, imap.im_blkno, - (int)imap.im_len, XFS_BUF_LOCK, &bp); - if (error) { -#ifdef DEBUG - xfs_fs_cmn_err(CE_ALERT, mp, "xfs_itobp: " - "xfs_trans_read_buf() returned error %d, " - "imap.im_blkno 0x%llx, imap.im_len 0x%llx", - error, (unsigned long long) imap.im_blkno, - (unsigned long long) imap.im_len); -#endif /* DEBUG */ + error = xfs_imap_to_bp(mp, tp, &imap, &bp, XFS_BUF_LOCK, imap_flags); + if (error) return error; - } - - /* - * Validate the magic number and version of every inode in the buffer - * (if DEBUG kernel) or the first inode in the buffer, otherwise. - * No validation is done here in userspace (xfs_repair). - */ -#if !defined(__KERNEL__) - ni = 0; -#elif defined(DEBUG) - ni = BBTOB(imap.im_len) >> mp->m_sb.sb_inodelog; -#else /* usual case */ - ni = 1; -#endif - - for (i = 0; i < ni; i++) { - int di_ok; - xfs_dinode_t *dip; - - dip = (xfs_dinode_t *)xfs_buf_offset(bp, - (i << mp->m_sb.sb_inodelog)); - di_ok = be16_to_cpu(dip->di_core.di_magic) == XFS_DINODE_MAGIC && - XFS_DINODE_GOOD_VERSION(dip->di_core.di_version); - if (unlikely(XFS_TEST_ERROR(!di_ok, mp, - XFS_ERRTAG_ITOBP_INOTOBP, - XFS_RANDOM_ITOBP_INOTOBP))) { - if (imap_flags & XFS_IMAP_BULKSTAT) { - xfs_trans_brelse(tp, bp); - return XFS_ERROR(EINVAL); - } -#ifdef DEBUG - cmn_err(CE_ALERT, - "Device %s - bad inode magic/vsn " - "daddr %lld #%d (magic=%x)", - XFS_BUFTARG_NAME(mp->m_ddev_targp), - (unsigned long long)imap.im_blkno, i, - be16_to_cpu(dip->di_core.di_magic)); -#endif - XFS_CORRUPTION_ERROR("xfs_itobp", XFS_ERRLEVEL_HIGH, - mp, dip); - xfs_trans_brelse(tp, bp); - return XFS_ERROR(EFSCORRUPTED); - } - } - - xfs_inobp_check(mp, bp); - /* - * Mark the buffer as an inode buffer now that it looks good - */ - XFS_BUF_SET_VTYPE(bp, B_FS_INO); - - /* - * Set *dipp to point to the on-disk inode in the buffer. - */ *dipp = (xfs_dinode_t *)xfs_buf_offset(bp, imap.im_boffset); *bpp = bp; return 0; From owner-xfs@oss.sgi.com Wed Nov 21 16:38:15 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 16:38:19 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_62, J_CHICKENPOX_63 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAM0cBb5012689 for ; Wed, 21 Nov 2007 16:38:13 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA14336; Thu, 22 Nov 2007 11:38:19 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAM0cIdD115388474; Thu, 22 Nov 2007 11:38:18 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAM0cHrk115396145; Thu, 22 Nov 2007 11:38:17 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 22 Nov 2007 11:38:17 +1100 From: David Chinner To: xfs-oss Cc: lkml Subject: [PATCH 5/9] Don't block pdflush when flushing inodes Message-ID: <20071122003817.GK114266761@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4874/Wed Nov 21 12:33:19 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13730 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs When pdflush is writing back inodes, it can get stuck on inode cluster buffers that are currently under I/O. This occurs when we write data to multiple inodes in the same inode cluster at the same time. Effectively, delayed allocation marks the inode dirty during the data writeback. Hence if the inode cluster was flushed during the writeback of the first inode, the writeback of the second inode will block waiting for the inode cluster write to complete before writing it again for the newly dirtied inode. Basically, we want to avoid this from happening so we don't block pdflush and slow down all of writeback. Hence we introduce a non-blocking async inode flush flag that pdflush uses. If this flag is set, we use non-blocking operations (e.g. try locks) where-ever we can to avoid blocking or extra I/O being issued. Signed-off-by: Dave Chinner --- fs/xfs/linux-2.6/xfs_super.c | 3 + fs/xfs/linux-2.6/xfs_vnode.h | 5 -- fs/xfs/xfs_inode.c | 82 +++++++++++++++++++++++++++++-------------- fs/xfs/xfs_inode.h | 8 +++- fs/xfs/xfs_trans_buf.c | 3 + fs/xfs/xfs_vnodeops.c | 55 ++++++---------------------- 6 files changed, 79 insertions(+), 77 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/xfs_inode.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_inode.c 2007-11-22 10:33:43.014729931 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_inode.c 2007-11-22 10:33:51.037704348 +1100 @@ -183,12 +183,20 @@ xfs_imap_to_bp( int ni; xfs_buf_t *bp; + if (buf_flags == 0) + buf_flags = XFS_BUF_LOCK; + error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp, imap->im_blkno, - (int)imap->im_len, XFS_BUF_LOCK, &bp); + (int)imap->im_len, buf_flags, &bp); if (error) { - cmn_err(CE_WARN, "xfs_imap_to_bp: xfs_trans_read_buf()returned " + if (error != EAGAIN) { + cmn_err(CE_WARN, + "xfs_imap_to_bp: xfs_trans_read_buf()returned " "an error %d on %s. Returning error.", error, mp->m_fsname); + } else { + ASSERT(buf_flags & XFS_BUF_TRYLOCK); + } return error; } @@ -306,14 +314,15 @@ xfs_inotobp( * 0 for the disk block address. */ int -xfs_itobp( +xfs_itobp_flags( xfs_mount_t *mp, xfs_trans_t *tp, xfs_inode_t *ip, xfs_dinode_t **dipp, xfs_buf_t **bpp, xfs_daddr_t bno, - uint imap_flags) + uint imap_flags, + uint buf_flags) { xfs_imap_t imap; xfs_buf_t *bp; @@ -344,10 +353,17 @@ xfs_itobp( } ASSERT(bno == 0 || bno == imap.im_blkno); - error = xfs_imap_to_bp(mp, tp, &imap, &bp, XFS_BUF_LOCK, imap_flags); + error = xfs_imap_to_bp(mp, tp, &imap, &bp, buf_flags, imap_flags); if (error) return error; + if (!bp) { + ASSERT(buf_flags & XFS_BUF_TRYLOCK); + ASSERT(tp == NULL); + *bpp = NULL; + return EAGAIN; + } + *dipp = (xfs_dinode_t *)xfs_buf_offset(bp, imap.im_boffset); *bpp = bp; return 0; @@ -3023,6 +3039,7 @@ xfs_iflush( int bufwasdelwri; struct hlist_node *entry; enum { INT_DELWRI = (1 << 0), INT_ASYNC = (1 << 1) }; + int noblock = (flags == XFS_IFLUSH_ASYNC_NOBLOCK); XFS_STATS_INC(xs_iflush_count); @@ -3047,11 +3064,22 @@ xfs_iflush( } /* - * We can't flush the inode until it is unpinned, so - * wait for it. We know noone new can pin it, because - * we are holding the inode lock shared and you need - * to hold it exclusively to pin the inode. + * We can't flush the inode until it is unpinned, so wait for it if we + * are allowed to block. We know noone new can pin it, because we are + * holding the inode lock shared and you need to hold it exclusively to + * pin the inode. + * + * If we are not allowed to block, force the log out asynchronously so + * that when we come back the inode will be unpinned. If other inodes + * in the same cluster are dirty, they will probably write the inode + * out for us if they occur after the log force completes. */ + + if (noblock && xfs_ipincount(ip)) { + xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE); + xfs_ifunlock(ip); + return EAGAIN; + } xfs_iunpin_wait(ip); /* @@ -3068,15 +3096,6 @@ xfs_iflush( } /* - * Get the buffer containing the on-disk inode. - */ - error = xfs_itobp(mp, NULL, ip, &dip, &bp, 0, 0); - if (error) { - xfs_ifunlock(ip); - return error; - } - - /* * Decide how buffer will be flushed out. This is done before * the call to xfs_iflush_int because this field is zeroed by it. */ @@ -3092,6 +3111,7 @@ xfs_iflush( case XFS_IFLUSH_DELWRI_ELSE_SYNC: flags = 0; break; + case XFS_IFLUSH_ASYNC_NOBLOCK: case XFS_IFLUSH_ASYNC: case XFS_IFLUSH_DELWRI_ELSE_ASYNC: flags = INT_ASYNC; @@ -3111,6 +3131,7 @@ xfs_iflush( case XFS_IFLUSH_DELWRI: flags = INT_DELWRI; break; + case XFS_IFLUSH_ASYNC_NOBLOCK: case XFS_IFLUSH_ASYNC: flags = INT_ASYNC; break; @@ -3125,6 +3146,16 @@ xfs_iflush( } /* + * Get the buffer containing the on-disk inode. + */ + error = xfs_itobp_flags(mp, NULL, ip, &dip, &bp, 0, 0, + (noblock) ? XFS_BUF_TRYLOCK : XFS_BUF_LOCK); + if (error || !bp) { + xfs_ifunlock(ip); + return error; + } + + /* * First flush out the inode that xfs_iflush was called with. */ error = xfs_iflush_int(ip, bp); @@ -3133,6 +3164,13 @@ xfs_iflush( } /* + * If the buffer is pinned then push on the log now so we won't + * get stuck waiting in the write for too long. + */ + if (XFS_BUF_ISPINNED(bp)) + xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE); + + /* * inode clustering: * see if other inodes can be gathered into this write */ @@ -3201,14 +3239,6 @@ xfs_iflush( XFS_STATS_ADD(xs_icluster_flushinode, clcount); } - /* - * If the buffer is pinned then push on the log so we won't - * get stuck waiting in the write for too long. - */ - if (XFS_BUF_ISPINNED(bp)){ - xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE); - } - if (flags & INT_DELWRI) { xfs_bdwrite(mp, bp); } else if (flags & INT_ASYNC) { Index: 2.6.x-xfs-new/fs/xfs/xfs_inode.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_inode.h 2007-11-22 10:25:25.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_inode.h 2007-11-22 10:33:51.041703837 +1100 @@ -436,6 +436,7 @@ xfs_iflags_test_and_clear(xfs_inode_t *i #define XFS_IFLUSH_SYNC 3 #define XFS_IFLUSH_ASYNC 4 #define XFS_IFLUSH_DELWRI 5 +#define XFS_IFLUSH_ASYNC_NOBLOCK 6 /* * Flags for xfs_itruncate_start(). @@ -488,9 +489,12 @@ int xfs_finish_reclaim_all(struct xfs_m /* * xfs_inode.c prototypes. */ -int xfs_itobp(struct xfs_mount *, struct xfs_trans *, +int xfs_itobp_flags(struct xfs_mount *, struct xfs_trans *, xfs_inode_t *, struct xfs_dinode **, struct xfs_buf **, - xfs_daddr_t, uint); + xfs_daddr_t, uint, uint); +#define xfs_itobp(mp, tp, ip, dipp, bpp, bno, iflags) \ + xfs_itobp_flags(mp, tp, ip, dipp, bpp, bno, iflags, XFS_BUF_LOCK) + int xfs_iread(struct xfs_mount *, struct xfs_trans *, xfs_ino_t, xfs_inode_t **, xfs_daddr_t, uint); int xfs_iread_extents(struct xfs_trans *, xfs_inode_t *, int); Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_super.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_super.c 2007-11-22 10:25:24.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_super.c 2007-11-22 10:33:51.041703837 +1100 @@ -840,7 +840,8 @@ xfs_fs_write_inode( struct inode *inode, int sync) { - int error = 0, flags = FLUSH_INODE; + int error = 0; + int flags = 0; xfs_itrace_entry(XFS_I(inode)); if (sync) { Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_vnode.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_vnode.h 2007-11-22 10:25:24.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_vnode.h 2007-11-22 10:33:51.041703837 +1100 @@ -73,12 +73,9 @@ typedef enum bhv_vrwlock { #define IO_INVIS 0x00020 /* don't update inode timestamps */ /* - * Flags for vop_iflush call + * Flags for xfs_inode_flush */ #define FLUSH_SYNC 1 /* wait for flush to complete */ -#define FLUSH_INODE 2 /* flush the inode itself */ -#define FLUSH_LOG 4 /* force the last log entry for - * this inode out to disk */ /* * Flush/Invalidate options for vop_toss/flush/flushinval_pages. Index: 2.6.x-xfs-new/fs/xfs/xfs_vnodeops.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_vnodeops.c 2007-11-22 10:25:26.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_vnodeops.c 2007-11-22 10:33:51.045703325 +1100 @@ -3546,29 +3546,6 @@ xfs_inode_flush( ((iip == NULL) || !(iip->ili_format.ilf_fields & XFS_ILOG_ALL))) return 0; - if (flags & FLUSH_LOG) { - if (iip && iip->ili_last_lsn) { - xlog_t *log = mp->m_log; - xfs_lsn_t sync_lsn; - int log_flags = XFS_LOG_FORCE; - - spin_lock(&log->l_grant_lock); - sync_lsn = log->l_last_sync_lsn; - spin_unlock(&log->l_grant_lock); - - if ((XFS_LSN_CMP(iip->ili_last_lsn, sync_lsn) > 0)) { - if (flags & FLUSH_SYNC) - log_flags |= XFS_LOG_SYNC; - error = xfs_log_force(mp, iip->ili_last_lsn, log_flags); - if (error) - return error; - } - - if (ip->i_update_core == 0) - return 0; - } - } - /* * We make this non-blocking if the inode is contended, * return EAGAIN to indicate to the caller that they @@ -3576,30 +3553,22 @@ xfs_inode_flush( * blocking on inodes inside another operation right * now, they get caught later by xfs_sync. */ - if (flags & FLUSH_INODE) { - int flush_flags; - - if (flags & FLUSH_SYNC) { - xfs_ilock(ip, XFS_ILOCK_SHARED); - xfs_iflock(ip); - } else if (xfs_ilock_nowait(ip, XFS_ILOCK_SHARED)) { - if (xfs_ipincount(ip) || !xfs_iflock_nowait(ip)) { - xfs_iunlock(ip, XFS_ILOCK_SHARED); - return EAGAIN; - } - } else { + if (flags & FLUSH_SYNC) { + xfs_ilock(ip, XFS_ILOCK_SHARED); + xfs_iflock(ip); + } else if (xfs_ilock_nowait(ip, XFS_ILOCK_SHARED)) { + if (xfs_ipincount(ip) || !xfs_iflock_nowait(ip)) { + xfs_iunlock(ip, XFS_ILOCK_SHARED); return EAGAIN; } - - if (flags & FLUSH_SYNC) - flush_flags = XFS_IFLUSH_SYNC; - else - flush_flags = XFS_IFLUSH_ASYNC; - - error = xfs_iflush(ip, flush_flags); - xfs_iunlock(ip, XFS_ILOCK_SHARED); + } else { + return EAGAIN; } + error = xfs_iflush(ip, (flags & FLUSH_SYNC) ? XFS_IFLUSH_SYNC + : XFS_IFLUSH_ASYNC_NOBLOCK); + xfs_iunlock(ip, XFS_ILOCK_SHARED); + return error; } Index: 2.6.x-xfs-new/fs/xfs/xfs_trans_buf.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_trans_buf.c 2007-11-22 10:25:24.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_trans_buf.c 2007-11-22 10:33:51.045703325 +1100 @@ -304,7 +304,8 @@ xfs_trans_read_buf( if (tp == NULL) { bp = xfs_buf_read_flags(target, blkno, len, flags | BUF_BUSY); if (!bp) - return XFS_ERROR(ENOMEM); + return (flags & XFS_BUF_TRYLOCK) ? + EAGAIN : XFS_ERROR(ENOMEM); if ((bp != NULL) && (XFS_BUF_GETERROR(bp) != 0)) { xfs_ioerror_alert("xfs_trans_read_buf", mp, From owner-xfs@oss.sgi.com Wed Nov 21 16:39:52 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 16:39:55 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAM0dlYs013291 for ; Wed, 21 Nov 2007 16:39:49 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA14405; Thu, 22 Nov 2007 11:39:54 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAM0drdD115263098; Thu, 22 Nov 2007 11:39:53 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAM0dqRQ115523958; Thu, 22 Nov 2007 11:39:52 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 22 Nov 2007 11:39:52 +1100 From: David Chinner To: xfs-oss Cc: lkml Subject: [PATCH 6/9] Remove xfs_icluster Message-ID: <20071122003952.GL114266761@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4874/Wed Nov 21 12:33:19 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13731 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Remove the xfs_icluster structure and replace with a radix tree lookup. We don't need to keep a list of inodes in each cluster around anymore as we can look them up quickly when we need to. The only time we need to do this now is during inode writeback. Factor the inode cluster writeback code out of xfs_iflush and convert it to use radix_tree_gang_lookup() instead of walking a list of inodes built when we first read in the inodes. This remove 3 pointers from each xfs_inode structure and the xfs_icluster structure per inode cluster. Hence we reduce the cache footprint of the xfs_inodes by between 5-10% depending on cluster sparseness. To be truly efficient we need a radix_tree_gang_lookup_range() call to stop searching once we are past the end of the cluster instead of trying to find a full cluster's worth of inodes. Before (ia64): $ cat /sys/slab/xfs_inode/object_size 536 After: $ cat /sys/slab/xfs_inode/object_size 512 Signed-off-by: Dave Chinner --- fs/xfs/linux-2.6/xfs_ksyms.c | 1 fs/xfs/xfs_iget.c | 49 ------- fs/xfs/xfs_inode.c | 266 ++++++++++++++++++++++++------------------- fs/xfs/xfs_inode.h | 16 -- fs/xfs/xfs_vfsops.c | 5 fs/xfs/xfsidbg.c | 4 6 files changed, 154 insertions(+), 187 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/xfs_iget.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_iget.c 2007-11-22 10:25:24.178458638 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_iget.c 2007-11-22 10:33:53.993326524 +1100 @@ -78,7 +78,6 @@ xfs_iget_core( xfs_inode_t *ip; xfs_inode_t *iq; int error; - xfs_icluster_t *icl, *new_icl = NULL; unsigned long first_index, mask; xfs_perag_t *pag; xfs_agino_t agino; @@ -229,11 +228,9 @@ finish_inode: } /* - * This is a bit messy - we preallocate everything we _might_ - * need before we pick up the ici lock. That way we don't have to - * juggle locks and go all the way back to the start. + * Preload the radix tree so we can insert safely under the + * write spinlock. */ - new_icl = kmem_zone_alloc(xfs_icluster_zone, KM_SLEEP); if (radix_tree_preload(GFP_KERNEL)) { delay(1); goto again; @@ -241,17 +238,6 @@ finish_inode: mask = ~(((XFS_INODE_CLUSTER_SIZE(mp) >> mp->m_sb.sb_inodelog)) - 1); first_index = agino & mask; write_lock(&pag->pag_ici_lock); - - /* - * Find the cluster if it exists - */ - icl = NULL; - if (radix_tree_gang_lookup(&pag->pag_ici_root, (void**)&iq, - first_index, 1)) { - if ((XFS_INO_TO_AGINO(mp, iq->i_ino) & mask) == first_index) - icl = iq->i_cluster; - } - /* * insert the new inode */ @@ -266,30 +252,13 @@ finish_inode: } /* - * These values _must_ be set before releasing ihlock! + * These values _must_ be set before releasing the radix tree lock! */ ip->i_udquot = ip->i_gdquot = NULL; xfs_iflags_set(ip, XFS_INEW); - ASSERT(ip->i_cluster == NULL); - - if (!icl) { - spin_lock_init(&new_icl->icl_lock); - INIT_HLIST_HEAD(&new_icl->icl_inodes); - icl = new_icl; - new_icl = NULL; - } else { - ASSERT(!hlist_empty(&icl->icl_inodes)); - } - spin_lock(&icl->icl_lock); - hlist_add_head(&ip->i_cnode, &icl->icl_inodes); - ip->i_cluster = icl; - spin_unlock(&icl->icl_lock); - write_unlock(&pag->pag_ici_lock); radix_tree_preload_end(); - if (new_icl) - kmem_zone_free(xfs_icluster_zone, new_icl); /* * Link ip to its mount and thread it on the mount's inode list. @@ -528,18 +497,6 @@ xfs_iextract( xfs_put_perag(mp, pag); /* - * Remove from cluster list - */ - mp = ip->i_mount; - spin_lock(&ip->i_cluster->icl_lock); - hlist_del(&ip->i_cnode); - spin_unlock(&ip->i_cluster->icl_lock); - - /* was last inode in cluster? */ - if (hlist_empty(&ip->i_cluster->icl_inodes)) - kmem_zone_free(xfs_icluster_zone, ip->i_cluster); - - /* * Remove from mount's inode list. */ XFS_MOUNT_ILOCK(mp); Index: 2.6.x-xfs-new/fs/xfs/xfs_inode.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_inode.c 2007-11-22 10:33:51.037704348 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_inode.c 2007-11-22 10:33:53.993326524 +1100 @@ -53,7 +53,6 @@ kmem_zone_t *xfs_ifork_zone; kmem_zone_t *xfs_inode_zone; -kmem_zone_t *xfs_icluster_zone; /* * Used in xfs_itruncate(). This is the maximum number of extents @@ -3014,6 +3013,151 @@ xfs_iflush_fork( return 0; } +STATIC int +xfs_iflush_cluster( + xfs_inode_t *ip, + xfs_buf_t *bp) +{ + xfs_mount_t *mp = ip->i_mount; + xfs_perag_t *pag = xfs_get_perag(mp, ip->i_ino); + unsigned long first_index, mask; + int ilist_size; + xfs_inode_t *ilist; + xfs_inode_t *iq; + xfs_inode_log_item_t *iip; + int nr_found; + int clcount = 0; + int bufwasdelwri; + + ASSERT(pag->pagi_inodeok); + ASSERT(pag->pag_ici_init); + + ilist_size = XFS_INODE_CLUSTER_SIZE(mp) * sizeof(xfs_inode_t *); + ilist = kmem_alloc(ilist_size, KM_MAYFAIL); + if (!ilist) + return 0; + + mask = ~(((XFS_INODE_CLUSTER_SIZE(mp) >> mp->m_sb.sb_inodelog)) - 1); + first_index = XFS_INO_TO_AGINO(mp, ip->i_ino) & mask; + read_lock(&pag->pag_ici_lock); + /* really need a gang lookup range call here */ + nr_found = radix_tree_gang_lookup(&pag->pag_ici_root, (void**)&ilist, + first_index, + XFS_INODE_CLUSTER_SIZE(mp)); + if (nr_found == 0) + goto out_free; + + for (iq = &ilist[0]; iq < &ilist[nr_found]; iq++) { + if (iq == ip) + continue; + /* if the inode lies outside this cluster, we're done. */ + if ((XFS_INO_TO_AGINO(mp, iq->i_ino) & mask) != first_index) + break; + /* + * Do an un-protected check to see if the inode is dirty and + * is a candidate for flushing. These checks will be repeated + * later after the appropriate locks are acquired. + */ + iip = iq->i_itemp; + if ((iq->i_update_core == 0) && + ((iip == NULL) || + !(iip->ili_format.ilf_fields & XFS_ILOG_ALL)) && + xfs_ipincount(iq) == 0) { + continue; + } + + /* + * Try to get locks. If any are unavailable or it is pinned, + * then this inode cannot be flushed and is skipped. + */ + + if (!xfs_ilock_nowait(iq, XFS_ILOCK_SHARED)) + continue; + if (!xfs_iflock_nowait(iq)) { + xfs_iunlock(iq, XFS_ILOCK_SHARED); + continue; + } + if (xfs_ipincount(iq)) { + xfs_ifunlock(iq); + xfs_iunlock(iq, XFS_ILOCK_SHARED); + continue; + } + + /* + * arriving here means that this inode can be flushed. First + * re-check that it's dirty before flushing. + */ + iip = iq->i_itemp; + if ((iq->i_update_core != 0) || ((iip != NULL) && + (iip->ili_format.ilf_fields & XFS_ILOG_ALL))) { + int error; + error = xfs_iflush_int(iq, bp); + if (error) { + xfs_iunlock(iq, XFS_ILOCK_SHARED); + goto cluster_corrupt_out; + } + clcount++; + } else { + xfs_ifunlock(iq); + } + xfs_iunlock(iq, XFS_ILOCK_SHARED); + } + + if (clcount) { + XFS_STATS_INC(xs_icluster_flushcnt); + XFS_STATS_ADD(xs_icluster_flushinode, clcount); + } + +out_free: + read_unlock(&pag->pag_ici_lock); + kmem_free(ilist, ilist_size); + return 0; + + +cluster_corrupt_out: + /* + * Corruption detected in the clustering loop. Invalidate the + * inode buffer and shut down the filesystem. + */ + read_unlock(&pag->pag_ici_lock); + /* + * Clean up the buffer. If it was B_DELWRI, just release it -- + * brelse can handle it with no problems. If not, shut down the + * filesystem before releasing the buffer. + */ + bufwasdelwri = XFS_BUF_ISDELAYWRITE(bp); + if (bufwasdelwri) + xfs_buf_relse(bp); + + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); + + if (!bufwasdelwri) { + /* + * Just like incore_relse: if we have b_iodone functions, + * mark the buffer as an error and call them. Otherwise + * mark it as stale and brelse. + */ + if (XFS_BUF_IODONE_FUNC(bp)) { + XFS_BUF_CLR_BDSTRAT_FUNC(bp); + XFS_BUF_UNDONE(bp); + XFS_BUF_STALE(bp); + XFS_BUF_SHUT(bp); + XFS_BUF_ERROR(bp,EIO); + xfs_biodone(bp); + } else { + XFS_BUF_STALE(bp); + xfs_buf_relse(bp); + } + } + + /* + * Unlocks the flush lock + */ + xfs_iflush_abort(iq); + kmem_free(ilist, ilist_size); + return XFS_ERROR(EFSCORRUPTED); +} + /* * xfs_iflush() will write a modified inode's changes out to the * inode's on disk home. The caller must have the inode lock held @@ -3033,13 +3177,8 @@ xfs_iflush( xfs_dinode_t *dip; xfs_mount_t *mp; int error; - /* REFERENCED */ - xfs_inode_t *iq; - int clcount; /* count of inodes clustered */ - int bufwasdelwri; - struct hlist_node *entry; - enum { INT_DELWRI = (1 << 0), INT_ASYNC = (1 << 1) }; int noblock = (flags == XFS_IFLUSH_ASYNC_NOBLOCK); + enum { INT_DELWRI = (1 << 0), INT_ASYNC = (1 << 1) }; XFS_STATS_INC(xs_iflush_count); @@ -3159,9 +3298,8 @@ xfs_iflush( * First flush out the inode that xfs_iflush was called with. */ error = xfs_iflush_int(ip, bp); - if (error) { + if (error) goto corrupt_out; - } /* * If the buffer is pinned then push on the log now so we won't @@ -3174,70 +3312,9 @@ xfs_iflush( * inode clustering: * see if other inodes can be gathered into this write */ - spin_lock(&ip->i_cluster->icl_lock); - ip->i_cluster->icl_buf = bp; - - clcount = 0; - hlist_for_each_entry(iq, entry, &ip->i_cluster->icl_inodes, i_cnode) { - if (iq == ip) - continue; - - /* - * Do an un-protected check to see if the inode is dirty and - * is a candidate for flushing. These checks will be repeated - * later after the appropriate locks are acquired. - */ - iip = iq->i_itemp; - if ((iq->i_update_core == 0) && - ((iip == NULL) || - !(iip->ili_format.ilf_fields & XFS_ILOG_ALL)) && - xfs_ipincount(iq) == 0) { - continue; - } - - /* - * Try to get locks. If any are unavailable, - * then this inode cannot be flushed and is skipped. - */ - - /* get inode locks (just i_lock) */ - if (xfs_ilock_nowait(iq, XFS_ILOCK_SHARED)) { - /* get inode flush lock */ - if (xfs_iflock_nowait(iq)) { - /* check if pinned */ - if (xfs_ipincount(iq) == 0) { - /* arriving here means that - * this inode can be flushed. - * first re-check that it's - * dirty - */ - iip = iq->i_itemp; - if ((iq->i_update_core != 0)|| - ((iip != NULL) && - (iip->ili_format.ilf_fields & XFS_ILOG_ALL))) { - clcount++; - error = xfs_iflush_int(iq, bp); - if (error) { - xfs_iunlock(iq, - XFS_ILOCK_SHARED); - goto cluster_corrupt_out; - } - } else { - xfs_ifunlock(iq); - } - } else { - xfs_ifunlock(iq); - } - } - xfs_iunlock(iq, XFS_ILOCK_SHARED); - } - } - spin_unlock(&ip->i_cluster->icl_lock); - - if (clcount) { - XFS_STATS_INC(xs_icluster_flushcnt); - XFS_STATS_ADD(xs_icluster_flushinode, clcount); - } + error = xfs_iflush_cluster(ip, bp); + if (error) + goto cluster_corrupt_out; if (flags & INT_DELWRI) { xfs_bdwrite(mp, bp); @@ -3251,52 +3328,11 @@ xfs_iflush( corrupt_out: xfs_buf_relse(bp); xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); - xfs_iflush_abort(ip); - /* - * Unlocks the flush lock - */ - return XFS_ERROR(EFSCORRUPTED); - cluster_corrupt_out: - /* Corruption detected in the clustering loop. Invalidate the - * inode buffer and shut down the filesystem. - */ - spin_unlock(&ip->i_cluster->icl_lock); - - /* - * Clean up the buffer. If it was B_DELWRI, just release it -- - * brelse can handle it with no problems. If not, shut down the - * filesystem before releasing the buffer. - */ - if ((bufwasdelwri= XFS_BUF_ISDELAYWRITE(bp))) { - xfs_buf_relse(bp); - } - - xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); - - if(!bufwasdelwri) { - /* - * Just like incore_relse: if we have b_iodone functions, - * mark the buffer as an error and call them. Otherwise - * mark it as stale and brelse. - */ - if (XFS_BUF_IODONE_FUNC(bp)) { - XFS_BUF_CLR_BDSTRAT_FUNC(bp); - XFS_BUF_UNDONE(bp); - XFS_BUF_STALE(bp); - XFS_BUF_SHUT(bp); - XFS_BUF_ERROR(bp,EIO); - xfs_biodone(bp); - } else { - XFS_BUF_STALE(bp); - xfs_buf_relse(bp); - } - } - - xfs_iflush_abort(iq); /* * Unlocks the flush lock */ + xfs_iflush_abort(ip); return XFS_ERROR(EFSCORRUPTED); } Index: 2.6.x-xfs-new/fs/xfs/xfs_inode.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_inode.h 2007-11-22 10:33:51.041703837 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_inode.h 2007-11-22 10:33:53.997326012 +1100 @@ -133,19 +133,6 @@ typedef struct dm_attrs_s { } dm_attrs_t; /* - * This is the xfs inode cluster structure. This structure is used by - * xfs_iflush to find inodes that share a cluster and can be flushed to disk at - * the same time. - */ -typedef struct xfs_icluster { - struct hlist_head icl_inodes; /* list of inodes on cluster */ - xfs_daddr_t icl_blkno; /* starting block number of - * the cluster */ - struct xfs_buf *icl_buf; /* the inode buffer */ - spinlock_t icl_lock; /* inode list lock */ -} xfs_icluster_t; - -/* * This is the xfs in-core inode structure. * Most of the on-disk inode is embedded in the i_d field. * @@ -252,8 +239,6 @@ typedef struct xfs_inode { unsigned int i_delayed_blks; /* count of delay alloc blks */ xfs_icdinode_t i_d; /* most of ondisk inode */ - xfs_icluster_t *i_cluster; /* cluster list header */ - struct hlist_node i_cnode; /* cluster link node */ xfs_fsize_t i_size; /* in-memory size */ xfs_fsize_t i_new_size; /* size when write completes */ @@ -578,7 +563,6 @@ void xfs_inobp_check(struct xfs_mount * #define xfs_inobp_check(mp, bp) #endif /* DEBUG */ -extern struct kmem_zone *xfs_icluster_zone; extern struct kmem_zone *xfs_ifork_zone; extern struct kmem_zone *xfs_inode_zone; extern struct kmem_zone *xfs_ili_zone; Index: 2.6.x-xfs-new/fs/xfs/xfs_vfsops.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_vfsops.c 2007-11-22 10:31:27.872002517 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_vfsops.c 2007-11-22 10:33:53.997326012 +1100 @@ -113,9 +113,6 @@ xfs_init(void) xfs_ili_zone = kmem_zone_init_flags(sizeof(xfs_inode_log_item_t), "xfs_ili", KM_ZONE_SPREAD, NULL); - xfs_icluster_zone = - kmem_zone_init_flags(sizeof(xfs_icluster_t), "xfs_icluster", - KM_ZONE_SPREAD, NULL); /* * Allocate global trace buffers. @@ -153,7 +150,6 @@ xfs_cleanup(void) extern kmem_zone_t *xfs_inode_zone; extern kmem_zone_t *xfs_efd_zone; extern kmem_zone_t *xfs_efi_zone; - extern kmem_zone_t *xfs_icluster_zone; xfs_cleanup_procfs(); xfs_sysctl_unregister(); @@ -189,7 +185,6 @@ xfs_cleanup(void) kmem_zone_destroy(xfs_efi_zone); kmem_zone_destroy(xfs_ifork_zone); kmem_zone_destroy(xfs_ili_zone); - kmem_zone_destroy(xfs_icluster_zone); } /* Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_ksyms.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_ksyms.c 2007-11-22 10:25:24.226452510 +1100 +++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_ksyms.c 2007-11-22 10:33:53.997326012 +1100 @@ -211,7 +211,6 @@ EXPORT_SYMBOL(xfs_bulkstat); EXPORT_SYMBOL(xfs_bunmapi); EXPORT_SYMBOL(xfs_bwrite); EXPORT_SYMBOL(xfs_change_file_space); -EXPORT_SYMBOL(xfs_icluster_zone); EXPORT_SYMBOL(xfs_dev_is_read_only); EXPORT_SYMBOL(xfs_dir_ialloc); EXPORT_SYMBOL(xfs_error_report); Index: 2.6.x-xfs-new/fs/xfs/xfsidbg.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfsidbg.c 2007-11-22 10:25:25.994226806 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfsidbg.c 2007-11-22 10:33:54.001325501 +1100 @@ -6465,10 +6465,6 @@ xfsidbg_xnode(xfs_inode_t *ip) qprintf(" dir trace 0x%p\n", ip->i_dir_trace); #endif kdb_printf("\n"); - kdb_printf("icluster 0x%p cnext 0x%p cprev 0x%p\n", - ip->i_cluster, - ip->i_cnode.next, - ip->i_cnode.pprev); xfs_xnode_fork("data", &ip->i_df); xfs_xnode_fork("attr", ip->i_afp); kdb_printf("\n"); From owner-xfs@oss.sgi.com Wed Nov 21 16:41:31 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 16:41:36 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAM0fSxN013840 for ; Wed, 21 Nov 2007 16:41:29 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA14495; Thu, 22 Nov 2007 11:41:34 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAM0fXdD115547324; Thu, 22 Nov 2007 11:41:33 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAM0fWu9114277575; Thu, 22 Nov 2007 11:41:32 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 22 Nov 2007 11:41:32 +1100 From: David Chinner To: xfs-oss Cc: lkml Subject: [PATCH 7/9] Use radix_tree_gang_lookup_range for cluster lookups Message-ID: <20071122004132.GM114266761@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4874/Wed Nov 21 12:33:19 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13732 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Use radix_tree_gang_lookup_range() for inode cluster lookups Now that we have an efficent lookup method for the radix tree, convert cluster lookups to use it. Factor out the common lookup, add some debug checking to it and call it where needed. For sanity, we need to hold the radix tree lock in read mode across the entire set of locking operations done to ensure we can operate on the inodes. This does increase the length of time we hold the lock in read mode, but we'll correct that with another patch. Signed-off-by: Dave Chinner --- fs/xfs/xfs_inode.c | 83 +++++++++++++++++++++++++++++++++++------------------ 1 file changed, 56 insertions(+), 27 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/xfs_inode.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_inode.c 2007-11-22 10:33:53.993326524 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_inode.c 2007-11-22 10:33:55.877085717 +1100 @@ -2165,6 +2165,37 @@ STATIC_INLINE int xfs_inode_clean(xfs_in (ip->i_update_core == 0)); } +/* lookup all the inodes in the cluster */ +STATIC int +xfs_icluster_lookup( + xfs_mount_t *mp, + xfs_perag_t *pag, + xfs_ino_t ino, + xfs_inode_t **ilist, + int clsize) +{ + unsigned long first_index, last_index, mask; + int nr_found; + + mask = ~(clsize - 1); + first_index = XFS_INO_TO_AGINO(mp, ino) & mask; + last_index = (XFS_INO_TO_AGINO(mp, ino + clsize) & mask) - 1; + nr_found = radix_tree_gang_lookup_range(&pag->pag_ici_root, + (void**)ilist, first_index, last_index, + clsize); + ASSERT(nr_found <= clsize); +#ifdef DEBUG +{ int i; + xfs_inode_t *iq; + for (i = 0; i < nr_found; i++) { + iq = ilist[i]; + ASSERT((XFS_INO_TO_AGINO(mp, iq->i_ino) & mask) == first_index); + } +} +#endif + return nr_found; +} + STATIC void xfs_ifree_cluster( xfs_inode_t *free_ip, @@ -2178,7 +2209,7 @@ xfs_ifree_cluster( int i, j, found, pre_flushed; xfs_daddr_t blkno; xfs_buf_t *bp; - xfs_inode_t *ip, **ip_found; + xfs_inode_t *ip, **ip_found, **ilist; xfs_inode_log_item_t *iip; xfs_log_item_t *lip; xfs_perag_t *pag = xfs_get_perag(mp, inum); @@ -2195,8 +2226,10 @@ xfs_ifree_cluster( } ip_found = kmem_alloc(ninodes * sizeof(xfs_inode_t *), KM_NOFS); + ilist = kmem_alloc(ninodes * sizeof(xfs_inode_t *), KM_NOFS); for (j = 0; j < nbufs; j++, inum += ninodes) { + int nr_found; blkno = XFS_AGB_TO_DADDR(mp, XFS_INO_TO_AGNO(mp, inum), XFS_INO_TO_AGBNO(mp, inum)); @@ -2213,24 +2246,23 @@ xfs_ifree_cluster( * and fail, we need some other form of interlock * here. */ + read_lock(&pag->pag_ici_lock); + nr_found = xfs_icluster_lookup(mp, pag, inum, ilist, ninodes); + if (nr_found == 0) { + read_unlock(&pag->pag_ici_lock); + continue; + } found = 0; - for (i = 0; i < ninodes; i++) { - read_lock(&pag->pag_ici_lock); - ip = radix_tree_lookup(&pag->pag_ici_root, - XFS_INO_TO_AGINO(mp, (inum + i))); + for (i = 0; i < nr_found; i++) { + ip = ilist[i]; /* Inode not in memory or we found it already, * nothing to do */ - if (!ip || xfs_iflags_test(ip, XFS_ISTALE)) { - read_unlock(&pag->pag_ici_lock); + if (!ip || xfs_iflags_test(ip, XFS_ISTALE)) continue; - } - - if (xfs_inode_clean(ip)) { - read_unlock(&pag->pag_ici_lock); + if (xfs_inode_clean(ip)) continue; - } /* If we can get the locks then add it to the * list, otherwise by the time we get the bp lock @@ -2251,7 +2283,6 @@ xfs_ifree_cluster( ip_found[found++] = ip; } } - read_unlock(&pag->pag_ici_lock); continue; } @@ -2269,8 +2300,8 @@ xfs_ifree_cluster( xfs_iunlock(ip, XFS_ILOCK_EXCL); } } - read_unlock(&pag->pag_ici_lock); } + read_unlock(&pag->pag_ici_lock); bp = xfs_trans_get_buf(tp, mp->m_ddev_targp, blkno, mp->m_bsize * blks_per_cluster, @@ -2324,6 +2355,7 @@ xfs_ifree_cluster( } kmem_free(ip_found, ninodes * sizeof(xfs_inode_t *)); + kmem_free(ilist, ninodes * sizeof(xfs_inode_t *)); xfs_put_perag(mp, pag); } @@ -3013,6 +3045,7 @@ xfs_iflush_fork( return 0; } + STATIC int xfs_iflush_cluster( xfs_inode_t *ip, @@ -3020,39 +3053,35 @@ xfs_iflush_cluster( { xfs_mount_t *mp = ip->i_mount; xfs_perag_t *pag = xfs_get_perag(mp, ip->i_ino); - unsigned long first_index, mask; + unsigned long inodes_per_cluster; int ilist_size; - xfs_inode_t *ilist; + xfs_inode_t **ilist; xfs_inode_t *iq; xfs_inode_log_item_t *iip; int nr_found; int clcount = 0; int bufwasdelwri; + int i; ASSERT(pag->pagi_inodeok); ASSERT(pag->pag_ici_init); - ilist_size = XFS_INODE_CLUSTER_SIZE(mp) * sizeof(xfs_inode_t *); + inodes_per_cluster = XFS_INODE_CLUSTER_SIZE(mp) >> mp->m_sb.sb_inodelog; + ilist_size = inodes_per_cluster * sizeof(xfs_inode_t *); ilist = kmem_alloc(ilist_size, KM_MAYFAIL); if (!ilist) return 0; - mask = ~(((XFS_INODE_CLUSTER_SIZE(mp) >> mp->m_sb.sb_inodelog)) - 1); - first_index = XFS_INO_TO_AGINO(mp, ip->i_ino) & mask; read_lock(&pag->pag_ici_lock); - /* really need a gang lookup range call here */ - nr_found = radix_tree_gang_lookup(&pag->pag_ici_root, (void**)&ilist, - first_index, - XFS_INODE_CLUSTER_SIZE(mp)); + nr_found = xfs_icluster_lookup(mp, pag, ip->i_ino, ilist, + inodes_per_cluster); if (nr_found == 0) goto out_free; - for (iq = &ilist[0]; iq < &ilist[nr_found]; iq++) { + for (i = 0; i < nr_found; i++) { + iq = ilist[i]; if (iq == ip) continue; - /* if the inode lies outside this cluster, we're done. */ - if ((XFS_INO_TO_AGINO(mp, iq->i_ino) & mask) != first_index) - break; /* * Do an un-protected check to see if the inode is dirty and * is a candidate for flushing. These checks will be repeated From owner-xfs@oss.sgi.com Wed Nov 21 16:42:58 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 16:43:02 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAM0gqkb014381 for ; Wed, 21 Nov 2007 16:42:55 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA14560; Thu, 22 Nov 2007 11:42:59 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAM0gwdD115329899; Thu, 22 Nov 2007 11:42:58 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAM0gwn8114245046; Thu, 22 Nov 2007 11:42:58 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 22 Nov 2007 11:42:58 +1100 From: David Chinner To: xfs-oss Cc: lkml Subject: [PATCH 8/9] Convert inode cache locking to RCU Message-ID: <20071122004257.GN114266761@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4874/Wed Nov 21 12:33:19 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13733 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Use RCU locking on the inode radix trees To make use of the efficient radix tree gang lookups for inode cluster operations we had to increase the time we hold the radix tree read lock for. This will affect performance somewhat. Given that all the lookups are done on a radix tree and we already have mechanisms to determine if an inode is valid or not during lookup, we can pretty easily move this across to lockless lookups using RCU. The wrinkle is that the current read lock is used to synchronise inode reclaim and lookup. Luckily, we have the inode flags lock which is used in the same places as we need for this synchronisation and hence the code can be easily changed to use this lock for reclaim/lookup synchronisation. Also, we can avoid growing the xfs_inode structure to place the rcuhead structure for the rcu_call() on inode destruction by reusing the reclaim list listhead structure. We can safely do this because the inode has been removed from the reclaim list before the reclaim code calls xfs_idestroy(). This is effectively the same trick as used in the dentry cache to avoid growing the dentry structure. Signed-off-by: Dave Chinner --- fs/xfs/xfs_ag.h | 2 fs/xfs/xfs_iget.c | 107 ++++++++++++++++++++++++++++++-------------------- fs/xfs/xfs_inode.c | 47 ++++++++++++--------- fs/xfs/xfs_inode.h | 14 +++++- fs/xfs/xfs_mount.c | 2 fs/xfs/xfs_vnodeops.c | 8 --- 6 files changed, 108 insertions(+), 72 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/xfs_iget.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_iget.c 2007-11-22 10:33:53.993326524 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_iget.c 2007-11-22 10:33:57.724849511 +1100 @@ -40,6 +40,37 @@ #include "xfs_utils.h" /* + * Attempt to move the inode out of the IRECLAIMABLE state. + * Must be called under rcu_read_lock() and with the ip->i_flags_lock + * held for synchronisation with xfs_ireclaim_finish(). + */ +STATIC int +xfs_iget_reclaim_check( + xfs_inode_t *ip, + int flags) +{ + /* + * If IRECLAIM is set this inode is on its way out of the system, we + * need to pause and try again. + */ + if (__xfs_iflags_test(ip, XFS_IRECLAIM)) + return EAGAIN; + ASSERT(__xfs_iflags_test(ip, XFS_IRECLAIMABLE)); + + /* + * If lookup is racing with unlink, then we should return an error + * immediately so we don't remove it from the reclaim list and + * potentially leak the inode. + */ + + if ((ip->i_d.di_mode == 0) && !(flags & XFS_IGET_CREATE)) + return ENOENT; + + __xfs_iflags_clear(ip, XFS_IRECLAIMABLE); + return 0; +} + +/* * Look up an inode by number in the given file system. * The inode is looked up in the cache held in each AG. * If the inode is found in the cache, attach it to the provided @@ -94,7 +125,7 @@ xfs_iget_core( agino = XFS_INO_TO_AGINO(mp, ino); again: - read_lock(&pag->pag_ici_lock); + rcu_read_lock(); ip = radix_tree_lookup(&pag->pag_ici_root, agino); if (ip != NULL) { @@ -103,52 +134,44 @@ again: * we need to pause and try again. */ if (xfs_iflags_test(ip, XFS_INEW)) { - read_unlock(&pag->pag_ici_lock); + rcu_read_unlock(); delay(1); XFS_STATS_INC(xs_ig_frecycle); goto again; } + /* + * Determine if the inode is queued for reclaim or being + * reclaimed. This is trickier now we are under RCU locking. + * + * Basically, xfs_ireclaim_finish() uses the i_flags_lock to + * atomically move the inode out of the IRECLAIMABLE state and + * inode the IRECLAIM state, so we have to use the same lock to + * do an equivalent set of tests and move the inode out of the + * IRECLAIMABLE state. + */ old_inode = ip->i_vnode; if (old_inode == NULL) { - /* - * If IRECLAIM is set this inode is - * on its way out of the system, - * we need to pause and try again. - */ - if (xfs_iflags_test(ip, XFS_IRECLAIM)) { - read_unlock(&pag->pag_ici_lock); - delay(1); + spin_lock(&ip->i_flags_lock); + error = xfs_iget_reclaim_check(ip, flags); + spin_unlock(&ip->i_flags_lock); + rcu_read_unlock(); + if (error) { XFS_STATS_INC(xs_ig_frecycle); - - goto again; - } - ASSERT(xfs_iflags_test(ip, XFS_IRECLAIMABLE)); - - /* - * If lookup is racing with unlink, then we - * should return an error immediately so we - * don't remove it from the reclaim list and - * potentially leak the inode. - */ - if ((ip->i_d.di_mode == 0) && - !(flags & XFS_IGET_CREATE)) { - read_unlock(&pag->pag_ici_lock); + if (error == EAGAIN) { + delay(1); + goto again; + } xfs_put_perag(mp, pag); - return ENOENT; + return error; } - - xfs_itrace_exit_tag(ip, "xfs_iget.alloc"); - - XFS_STATS_INC(xs_ig_found); - xfs_iflags_clear(ip, XFS_IRECLAIMABLE); - read_unlock(&pag->pag_ici_lock); - XFS_MOUNT_ILOCK(mp); list_del_init(&ip->i_reclaim); XFS_MOUNT_IUNLOCK(mp); + XFS_STATS_INC(xs_ig_found); + xfs_itrace_exit_tag(ip, "xfs_iget.alloc"); goto finish_inode; } else if (inode != old_inode) { @@ -156,7 +179,7 @@ again: * try again. */ if (old_inode->i_state & (I_FREEING | I_CLEAR)) { - read_unlock(&pag->pag_ici_lock); + rcu_read_unlock(); delay(1); XFS_STATS_INC(xs_ig_frecycle); @@ -174,7 +197,7 @@ again: /* * Inode cache hit */ - read_unlock(&pag->pag_ici_lock); + rcu_read_unlock(); XFS_STATS_INC(xs_ig_found); finish_inode: @@ -194,7 +217,7 @@ finish_inode: /* * Inode cache miss */ - read_unlock(&pag->pag_ici_lock); + rcu_read_unlock(); XFS_STATS_INC(xs_ig_missed); /* @@ -237,14 +260,14 @@ finish_inode: } mask = ~(((XFS_INODE_CLUSTER_SIZE(mp) >> mp->m_sb.sb_inodelog)) - 1); first_index = agino & mask; - write_lock(&pag->pag_ici_lock); + spin_lock(&pag->pag_ici_lock); /* * insert the new inode */ error = radix_tree_insert(&pag->pag_ici_root, agino, ip); if (unlikely(error)) { BUG_ON(error != -EEXIST); - write_unlock(&pag->pag_ici_lock); + spin_unlock(&pag->pag_ici_lock); radix_tree_preload_end(); xfs_idestroy(ip); XFS_STATS_INC(xs_ig_dup); @@ -257,7 +280,7 @@ finish_inode: ip->i_udquot = ip->i_gdquot = NULL; xfs_iflags_set(ip, XFS_INEW); - write_unlock(&pag->pag_ici_lock); + spin_unlock(&pag->pag_ici_lock); radix_tree_preload_end(); /* @@ -378,9 +401,9 @@ xfs_inode_incore(xfs_mount_t *mp, xfs_perag_t *pag; pag = xfs_get_perag(mp, ino); - read_lock(&pag->pag_ici_lock); + rcu_read_lock(); ip = radix_tree_lookup(&pag->pag_ici_root, XFS_INO_TO_AGINO(mp, ino)); - read_unlock(&pag->pag_ici_lock); + rcu_read_unlock(); xfs_put_perag(mp, pag); /* the returned inode must match the transaction */ @@ -491,9 +514,9 @@ xfs_iextract( xfs_perag_t *pag = xfs_get_perag(mp, ip->i_ino); xfs_inode_t *iq; - write_lock(&pag->pag_ici_lock); + spin_lock(&pag->pag_ici_lock); radix_tree_delete(&pag->pag_ici_root, XFS_INO_TO_AGINO(mp, ip->i_ino)); - write_unlock(&pag->pag_ici_lock); + spin_unlock(&pag->pag_ici_lock); xfs_put_perag(mp, pag); /* Index: 2.6.x-xfs-new/fs/xfs/xfs_inode.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_inode.c 2007-11-22 10:33:55.877085717 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_inode.c 2007-11-22 10:33:57.728849000 +1100 @@ -2169,13 +2169,16 @@ STATIC_INLINE int xfs_inode_clean(xfs_in STATIC int xfs_icluster_lookup( xfs_mount_t *mp, - xfs_perag_t *pag, xfs_ino_t ino, xfs_inode_t **ilist, int clsize) { - unsigned long first_index, last_index, mask; - int nr_found; + unsigned long first_index, last_index, mask; + int nr_found; + xfs_perag_t *pag = xfs_get_perag(mp, ino); + + ASSERT(pag->pagi_inodeok); + ASSERT(pag->pag_ici_init); mask = ~(clsize - 1); first_index = XFS_INO_TO_AGINO(mp, ino) & mask; @@ -2183,6 +2186,7 @@ xfs_icluster_lookup( nr_found = radix_tree_gang_lookup_range(&pag->pag_ici_root, (void**)ilist, first_index, last_index, clsize); + xfs_put_perag(mp, pag); ASSERT(nr_found <= clsize); #ifdef DEBUG { int i; @@ -2212,7 +2216,6 @@ xfs_ifree_cluster( xfs_inode_t *ip, **ip_found, **ilist; xfs_inode_log_item_t *iip; xfs_log_item_t *lip; - xfs_perag_t *pag = xfs_get_perag(mp, inum); if (mp->m_sb.sb_blocksize >= XFS_INODE_CLUSTER_SIZE(mp)) { blks_per_cluster = 1; @@ -2246,10 +2249,10 @@ xfs_ifree_cluster( * and fail, we need some other form of interlock * here. */ - read_lock(&pag->pag_ici_lock); - nr_found = xfs_icluster_lookup(mp, pag, inum, ilist, ninodes); + rcu_read_lock(); + nr_found = xfs_icluster_lookup(mp, inum, ilist, ninodes); if (nr_found == 0) { - read_unlock(&pag->pag_ici_lock); + rcu_read_unlock(); continue; } found = 0; @@ -2301,7 +2304,7 @@ xfs_ifree_cluster( } } } - read_unlock(&pag->pag_ici_lock); + rcu_read_unlock(); bp = xfs_trans_get_buf(tp, mp->m_ddev_targp, blkno, mp->m_bsize * blks_per_cluster, @@ -2356,7 +2359,6 @@ xfs_ifree_cluster( kmem_free(ip_found, ninodes * sizeof(xfs_inode_t *)); kmem_free(ilist, ninodes * sizeof(xfs_inode_t *)); - xfs_put_perag(mp, pag); } /* @@ -2754,8 +2756,18 @@ xfs_idestroy_fork( * This is called free all the memory associated with an inode. * It must free the inode itself and any buffers allocated for * if_extents/if_data and if_broot. It must also free the lock - * associated with the inode. + * associated with the inode. This uses RCU callbacks to do the + * freeing of memory so that the iradix cache can use RCU read side + * locking. */ +STATIC void +xfs_idestroy_callback( + struct rcu_head *head) +{ + xfs_inode_t *ip = container_of(head, xfs_inode_t, i_rcu); + kmem_zone_free(xfs_inode_zone, ip); +} + void xfs_idestroy( xfs_inode_t *ip) @@ -2811,10 +2823,9 @@ xfs_idestroy( } xfs_inode_item_destroy(ip); } - kmem_zone_free(xfs_inode_zone, ip); + call_rcu(&ip->i_rcu, xfs_idestroy_callback); } - /* * Increment the pin count of the given buffer. * This value is protected by ipinlock spinlock in the mount structure. @@ -3052,7 +3063,6 @@ xfs_iflush_cluster( xfs_buf_t *bp) { xfs_mount_t *mp = ip->i_mount; - xfs_perag_t *pag = xfs_get_perag(mp, ip->i_ino); unsigned long inodes_per_cluster; int ilist_size; xfs_inode_t **ilist; @@ -3063,17 +3073,14 @@ xfs_iflush_cluster( int bufwasdelwri; int i; - ASSERT(pag->pagi_inodeok); - ASSERT(pag->pag_ici_init); - inodes_per_cluster = XFS_INODE_CLUSTER_SIZE(mp) >> mp->m_sb.sb_inodelog; ilist_size = inodes_per_cluster * sizeof(xfs_inode_t *); ilist = kmem_alloc(ilist_size, KM_MAYFAIL); if (!ilist) return 0; - read_lock(&pag->pag_ici_lock); - nr_found = xfs_icluster_lookup(mp, pag, ip->i_ino, ilist, + rcu_read_lock(); + nr_found = xfs_icluster_lookup(mp, ip->i_ino, ilist, inodes_per_cluster); if (nr_found == 0) goto out_free; @@ -3138,7 +3145,7 @@ xfs_iflush_cluster( } out_free: - read_unlock(&pag->pag_ici_lock); + rcu_read_unlock(); kmem_free(ilist, ilist_size); return 0; @@ -3148,7 +3155,7 @@ cluster_corrupt_out: * Corruption detected in the clustering loop. Invalidate the * inode buffer and shut down the filesystem. */ - read_unlock(&pag->pag_ici_lock); + rcu_read_unlock(); /* * Clean up the buffer. If it was B_DELWRI, just release it -- * brelse can handle it with no problems. If not, shut down the Index: 2.6.x-xfs-new/fs/xfs/xfs_inode.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_inode.h 2007-11-22 10:33:53.997326012 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_inode.h 2007-11-22 10:33:57.728849000 +1100 @@ -203,7 +203,12 @@ typedef struct xfs_inode { struct xfs_inode *i_mnext; /* next inode in mount list */ struct xfs_inode *i_mprev; /* ptr to prev inode */ struct xfs_mount *i_mount; /* fs mount struct ptr */ - struct list_head i_reclaim; /* reclaim list */ + union { + struct list_head i_reclaim; /* reclaim list */ + struct rcu_head i_rcu; /* rcu reclaim list */ + } i_ru; +#define i_reclaim i_ru.i_reclaim +#define i_rcu i_ru.i_rcu bhv_vnode_t *i_vnode; /* vnode backpointer */ struct xfs_dquot *i_udquot; /* user dquot */ struct xfs_dquot *i_gdquot; /* group dquot */ @@ -285,10 +290,15 @@ xfs_iflags_set(xfs_inode_t *ip, unsigned } static inline void +__xfs_iflags_clear(xfs_inode_t *ip, unsigned short flags) +{ + ip->i_flags &= ~flags; +} +static inline void xfs_iflags_clear(xfs_inode_t *ip, unsigned short flags) { spin_lock(&ip->i_flags_lock); - ip->i_flags &= ~flags; + __xfs_iflags_clear(ip, flags); spin_unlock(&ip->i_flags_lock); } Index: 2.6.x-xfs-new/fs/xfs/xfs_ag.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_ag.h 2007-11-22 10:25:23.554538298 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_ag.h 2007-11-22 10:33:57.728849000 +1100 @@ -199,7 +199,7 @@ typedef struct xfs_perag atomic_t pagf_fstrms; /* # of filestreams active in this AG */ int pag_ici_init; /* incore inode cache initialised */ - rwlock_t pag_ici_lock; /* incore inode lock */ + spinlock_t pag_ici_lock; /* incore inode lock */ struct radix_tree_root pag_ici_root; /* incore inode cache root */ } xfs_perag_t; Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.c 2007-11-22 10:31:27.891999961 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_mount.c 2007-11-22 10:33:57.732848488 +1100 @@ -334,7 +334,7 @@ xfs_initialize_perag_icache( xfs_perag_t *pag) { if (!pag->pag_ici_init) { - rwlock_init(&pag->pag_ici_lock); + spin_lock_init(&pag->pag_ici_lock); INIT_RADIX_TREE(&pag->pag_ici_root, GFP_ATOMIC); pag->pag_ici_init = 1; } Index: 2.6.x-xfs-new/fs/xfs/xfs_vnodeops.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_vnodeops.c 2007-11-22 10:33:51.045703325 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_vnodeops.c 2007-11-22 10:33:57.732848488 +1100 @@ -3671,24 +3671,22 @@ xfs_finish_reclaim( int locked, int sync_mode) { - xfs_perag_t *pag = xfs_get_perag(ip->i_mount, ip->i_ino); bhv_vnode_t *vp = XFS_ITOV_NULL(ip); int error; if (vp && VN_BAD(vp)) goto reclaim; - /* The hash lock here protects a thread in xfs_iget_core from + /* + * The flags lock here protects a thread in xfs_iget_core from * racing with us on linking the inode back with a vnode. * Once we have the XFS_IRECLAIM flag set it will not touch * us. */ - write_lock(&pag->pag_ici_lock); spin_lock(&ip->i_flags_lock); if (__xfs_iflags_test(ip, XFS_IRECLAIM) || (!__xfs_iflags_test(ip, XFS_IRECLAIMABLE) && vp == NULL)) { spin_unlock(&ip->i_flags_lock); - write_unlock(&pag->pag_ici_lock); if (locked) { xfs_ifunlock(ip); xfs_iunlock(ip, XFS_ILOCK_EXCL); @@ -3697,8 +3695,6 @@ xfs_finish_reclaim( } __xfs_iflags_set(ip, XFS_IRECLAIM); spin_unlock(&ip->i_flags_lock); - write_unlock(&pag->pag_ici_lock); - xfs_put_perag(ip->i_mount, pag); /* * If the inode is still dirty, then flush it out. If the inode From owner-xfs@oss.sgi.com Wed Nov 21 16:44:24 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 16:44:29 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAM0iHST014931 for ; Wed, 21 Nov 2007 16:44:23 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA14650; Thu, 22 Nov 2007 11:44:24 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAM0iNdD115458378; Thu, 22 Nov 2007 11:44:23 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAM0iMa3114769698; Thu, 22 Nov 2007 11:44:22 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 22 Nov 2007 11:44:22 +1100 From: David Chinner To: xfs-oss Cc: lkml Subject: [PATCH 9/9] Clean up open coded inode dirty checks Message-ID: <20071122004422.GO114266761@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4874/Wed Nov 21 12:33:19 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13734 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Use xfs_inode_clean() in more places. Signed-off-by: Dave Chinner --- fs/xfs/xfs_inode.c | 27 +++++---------------------- fs/xfs/xfs_inode_item.h | 8 ++++++++ fs/xfs/xfs_vnodeops.c | 4 +--- 3 files changed, 14 insertions(+), 25 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/xfs_inode.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_inode.c 2007-11-22 10:33:57.728849000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_inode.c 2007-11-22 10:33:59.692597965 +1100 @@ -2158,13 +2158,6 @@ xfs_iunlink_remove( return 0; } -STATIC_INLINE int xfs_inode_clean(xfs_inode_t *ip) -{ - return (((ip->i_itemp == NULL) || - !(ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL)) && - (ip->i_update_core == 0)); -} - /* lookup all the inodes in the cluster */ STATIC int xfs_icluster_lookup( @@ -3067,7 +3060,6 @@ xfs_iflush_cluster( int ilist_size; xfs_inode_t **ilist; xfs_inode_t *iq; - xfs_inode_log_item_t *iip; int nr_found; int clcount = 0; int bufwasdelwri; @@ -3094,13 +3086,8 @@ xfs_iflush_cluster( * is a candidate for flushing. These checks will be repeated * later after the appropriate locks are acquired. */ - iip = iq->i_itemp; - if ((iq->i_update_core == 0) && - ((iip == NULL) || - !(iip->ili_format.ilf_fields & XFS_ILOG_ALL)) && - xfs_ipincount(iq) == 0) { + if (xfs_inode_clean(iq) && xfs_ipincount(iq) == 0) continue; - } /* * Try to get locks. If any are unavailable or it is pinned, @@ -3123,10 +3110,8 @@ xfs_iflush_cluster( * arriving here means that this inode can be flushed. First * re-check that it's dirty before flushing. */ - iip = iq->i_itemp; - if ((iq->i_update_core != 0) || ((iip != NULL) && - (iip->ili_format.ilf_fields & XFS_ILOG_ALL))) { - int error; + if (!xfs_inode_clean(iq)) { + int error; error = xfs_iflush_int(iq, bp); if (error) { xfs_iunlock(iq, XFS_ILOCK_SHARED); @@ -3230,8 +3215,7 @@ xfs_iflush( * If the inode isn't dirty, then just release the inode * flush lock and do nothing. */ - if ((ip->i_update_core == 0) && - ((iip == NULL) || !(iip->ili_format.ilf_fields & XFS_ILOG_ALL))) { + if (xfs_inode_clean(ip)) { ASSERT((iip != NULL) ? !(iip->ili_item.li_flags & XFS_LI_IN_AIL) : 1); xfs_ifunlock(ip); @@ -3398,8 +3382,7 @@ xfs_iflush_int( * If the inode isn't dirty, then just release the inode * flush lock and do nothing. */ - if ((ip->i_update_core == 0) && - ((iip == NULL) || !(iip->ili_format.ilf_fields & XFS_ILOG_ALL))) { + if (xfs_inode_clean(ip)) { xfs_ifunlock(ip); return 0; } Index: 2.6.x-xfs-new/fs/xfs/xfs_vnodeops.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_vnodeops.c 2007-11-22 10:33:57.732848488 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_vnodeops.c 2007-11-22 10:33:59.696597454 +1100 @@ -3532,7 +3532,6 @@ xfs_inode_flush( int flags) { xfs_mount_t *mp = ip->i_mount; - xfs_inode_log_item_t *iip = ip->i_itemp; int error = 0; if (XFS_FORCED_SHUTDOWN(mp)) @@ -3542,8 +3541,7 @@ xfs_inode_flush( * Bypass inodes which have already been cleaned by * the inode flush clustering code inside xfs_iflush */ - if ((ip->i_update_core == 0) && - ((iip == NULL) || !(iip->ili_format.ilf_fields & XFS_ILOG_ALL))) + if (xfs_inode_clean(ip)) return 0; /* Index: 2.6.x-xfs-new/fs/xfs/xfs_inode_item.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_inode_item.h 2007-11-22 10:25:23.286572511 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_inode_item.h 2007-11-22 10:33:59.696597454 +1100 @@ -168,6 +168,14 @@ static inline int xfs_ilog_fext(int w) return (w == XFS_DATA_FORK ? XFS_ILOG_DEXT : XFS_ILOG_AEXT); } +STATIC_INLINE int xfs_inode_clean(xfs_inode_t *ip) +{ + return (((ip->i_itemp == NULL) || + !(ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL)) && + (ip->i_update_core == 0)); +} + + #ifdef __KERNEL__ extern void xfs_inode_item_init(struct xfs_inode *, struct xfs_mount *); From owner-xfs@oss.sgi.com Wed Nov 21 16:46:42 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 16:46:45 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-0.4 required=5.0 tests=AWL,BAYES_40,J_CHICKENPOX_62, J_CHICKENPOX_64 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAM0kbSf015876 for ; Wed, 21 Nov 2007 16:46:39 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA14867; Thu, 22 Nov 2007 11:46:45 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAM0kidD114096661; Thu, 22 Nov 2007 11:46:44 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAM0kilZ114837129; Thu, 22 Nov 2007 11:46:44 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 22 Nov 2007 11:46:44 +1100 From: David Chinner To: xfs-oss Cc: xfs-dev Subject: [PATCH 1/2] AIL list threading V2 Message-ID: <20071122004643.GP114266761@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4874/Wed Nov 21 12:33:19 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13735 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs When many hundreds to thousands of threads all try to do simultaneous transactions and the log is in a tail-pushing situation (i.e. full), we can get multiple threads walking the AIL list and contending on the AIL lock. Recently wevve had two cases of machines basically locking up because most of the CPUs in the system are trying to obtain the AIL lock. The first was an 8p machine with ~2,500 kernel threads trying to do transactions, and the latest is a 2048p altix closing a file per MPI rank in a synchronised fashion resulting in > 400 processes all trying to walk and push the AIL at the same time. The AIL push is, in effect, a simple I/O dispatch algorithm complicated by the ordering constraints placed on it by the transaction subsystem. It really does not need multiple threads to push on it - even when only a single CPU is pushing the AIL, it can push the I/O out far faster that pretty much any disk subsystem can handle. So, to avoid contention problems stemming from multiple list walkers, move the list walk off into another thread and simply provide a "target" to push to. When a thread requires a push, it sets the target and wakes the push thread, then goes to sleep waiting for the required amount of space to become available in the log. This mechanism should also be a lot fairer under heavy load as the waiters will queue in arrival order, rather than queuing in "who completed a push first" order. Also, by moving the pushing to a separate thread we can do more effectively overload detection and prevention as we can keep context from loop iteration to loop iteration. That is, we can push only part of the list each loop and not have to loop back to the start of the list every time we run. This should also help by reducing the number of items we try to lock and/or push items that we cannot move. Note that this patch is not intended to solve the inefficiencies in the AIL structure and the associated issues with extremely large list contents. That needs to be addresses separately; parallel access would cause problems to any new structure as well, so I'm only aiming to isolate the structure from unbounded parallelism here. Version 2: o clean up xfs_trans_push_ail() o xfs_trans_push_ail() can be done unlocked - the lsn we are returning is never used so we only need to know if the AIL is not empty before deciding whether we need to wake up the push thread. o only check the threshold lsn against the current target once before waking the aild. o change checks of mp->m_log to ASSERT()s. Any time this fires it's indicative of a bug as the aild should only run when there is a log. o fixed switch indentation in xfsaild_push(). o initialised restarts variable correctly. o lengthen idle timeout to 1s. o return an error from xfs_trans_ail_init() and propagate it to fail mounting the log. o add comment to "stuck" checks indicating the source of the magic numbers. o pinned items are "stuck". Signed-Off-By: Dave Chinner --- fs/xfs/linux-2.6/xfs_super.c | 59 +++++++++ fs/xfs/xfs_log.c | 33 ++++- fs/xfs/xfs_mount.c | 6 fs/xfs/xfs_mount.h | 10 + fs/xfs/xfs_trans.h | 5 fs/xfs/xfs_trans_ail.c | 269 ++++++++++++++++++++++++++++--------------- fs/xfs/xfs_trans_priv.h | 8 + fs/xfs/xfsidbg.c | 12 - 8 files changed, 288 insertions(+), 114 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_super.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_super.c 2007-11-22 10:33:51.041703837 +1100 +++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_super.c 2007-11-22 10:34:01.556359712 +1100 @@ -51,6 +51,7 @@ #include "xfs_vfsops.h" #include "xfs_version.h" #include "xfs_log_priv.h" +#include "xfs_trans_priv.h" #include #include @@ -765,6 +766,64 @@ xfs_blkdev_issue_flush( blkdev_issue_flush(buftarg->bt_bdev, NULL); } +/* + * XFS AIL push thread support + */ +void +xfsaild_wakeup( + xfs_mount_t *mp, + xfs_lsn_t threshold_lsn) +{ + mp->m_ail.xa_target = threshold_lsn; + wake_up_process(mp->m_ail.xa_task); +} + +int +xfsaild( + void *data) +{ + xfs_mount_t *mp = (xfs_mount_t *)data; + xfs_lsn_t last_pushed_lsn = 0; + long tout = 0; + + while (!kthread_should_stop()) { + if (tout) + schedule_timeout_interruptible(msecs_to_jiffies(tout)); + tout = 1000; + + /* swsusp */ + try_to_freeze(); + + ASSERT(mp->m_log); + if (XFS_FORCED_SHUTDOWN(mp)) + continue; + + tout = xfsaild_push(mp, &last_pushed_lsn); + } + + return 0; +} /* xfsaild */ + +int +xfsaild_start( + xfs_mount_t *mp) +{ + mp->m_ail.xa_target = 0; + mp->m_ail.xa_task = kthread_run(xfsaild, mp, "xfsaild"); + if (IS_ERR(mp->m_ail.xa_task)) + return -PTR_ERR(mp->m_ail.xa_task); + return 0; +} + +void +xfsaild_stop( + xfs_mount_t *mp) +{ + kthread_stop(mp->m_ail.xa_task); +} + + + STATIC struct inode * xfs_fs_alloc_inode( struct super_block *sb) Index: 2.6.x-xfs-new/fs/xfs/xfs_log.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_log.c 2007-11-22 10:33:05.775490010 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_log.c 2007-11-22 10:34:01.560359200 +1100 @@ -498,11 +498,14 @@ xfs_log_reserve(xfs_mount_t *mp, * Return error or zero. */ int -xfs_log_mount(xfs_mount_t *mp, - xfs_buftarg_t *log_target, - xfs_daddr_t blk_offset, - int num_bblks) +xfs_log_mount( + xfs_mount_t *mp, + xfs_buftarg_t *log_target, + xfs_daddr_t blk_offset, + int num_bblks) { + int error; + if (!(mp->m_flags & XFS_MOUNT_NORECOVERY)) cmn_err(CE_NOTE, "XFS mounting filesystem %s", mp->m_fsname); else { @@ -515,11 +518,21 @@ xfs_log_mount(xfs_mount_t *mp, mp->m_log = xlog_alloc_log(mp, log_target, blk_offset, num_bblks); /* + * Initialize the AIL now we have a log. + */ + spin_lock_init(&mp->m_ail_lock); + error = xfs_trans_ail_init(mp); + if (error) { + cmn_err(CE_WARN, "XFS: AIL initialisation failed: error %d", error); + goto error; + } + + /* * skip log recovery on a norecovery mount. pretend it all * just worked. */ if (!(mp->m_flags & XFS_MOUNT_NORECOVERY)) { - int error, readonly = (mp->m_flags & XFS_MOUNT_RDONLY); + int readonly = (mp->m_flags & XFS_MOUNT_RDONLY); if (readonly) mp->m_flags &= ~XFS_MOUNT_RDONLY; @@ -530,8 +543,7 @@ xfs_log_mount(xfs_mount_t *mp, mp->m_flags |= XFS_MOUNT_RDONLY; if (error) { cmn_err(CE_WARN, "XFS: log mount/recovery failed: error %d", error); - xlog_dealloc_log(mp->m_log); - return error; + goto error; } } @@ -540,6 +552,9 @@ xfs_log_mount(xfs_mount_t *mp, /* End mounting message in xfs_log_mount_finish */ return 0; +error: + xfs_log_unmount_dealloc(mp); + return error; } /* xfs_log_mount */ /* @@ -722,10 +737,14 @@ xfs_log_unmount_write(xfs_mount_t *mp) /* * Deallocate log structures for unmount/relocation. + * + * We need to stop the aild from running before we destroy + * and deallocate the log as the aild references the log. */ void xfs_log_unmount_dealloc(xfs_mount_t *mp) { + xfs_trans_ail_destroy(mp); xlog_dealloc_log(mp->m_log); } Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.c 2007-11-22 10:33:57.732848488 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_mount.c 2007-11-22 10:34:01.560359200 +1100 @@ -137,15 +137,9 @@ xfs_mount_init(void) mp->m_flags |= XFS_MOUNT_NO_PERCPU_SB; } - spin_lock_init(&mp->m_ail_lock); spin_lock_init(&mp->m_sb_lock); mutex_init(&mp->m_ilock); mutex_init(&mp->m_growlock); - /* - * Initialize the AIL. - */ - xfs_trans_ail_init(mp); - atomic_set(&mp->m_active_trans, 0); return mp; Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.h 2007-11-22 10:25:24.974357020 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_mount.h 2007-11-22 10:34:01.560359200 +1100 @@ -219,12 +219,18 @@ extern void xfs_icsb_sync_counters_flags #define xfs_icsb_sync_counters_flags(mp, flags) do { } while (0) #endif +typedef struct xfs_ail { + xfs_ail_entry_t xa_ail; + uint xa_gen; + struct task_struct *xa_task; + xfs_lsn_t xa_target; +} xfs_ail_t; + typedef struct xfs_mount { struct super_block *m_super; xfs_tid_t m_tid; /* next unused tid for fs */ spinlock_t m_ail_lock; /* fs AIL mutex */ - xfs_ail_entry_t m_ail; /* fs active log item list */ - uint m_ail_gen; /* fs AIL generation count */ + xfs_ail_t m_ail; /* fs active log item list */ xfs_sb_t m_sb; /* copy of fs superblock */ spinlock_t m_sb_lock; /* sb counter lock */ struct xfs_buf *m_sb_bp; /* buffer for superblock */ Index: 2.6.x-xfs-new/fs/xfs/xfs_trans.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_trans.h 2007-11-22 10:25:24.978356509 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_trans.h 2007-11-22 10:34:01.564358689 +1100 @@ -992,8 +992,9 @@ int _xfs_trans_commit(xfs_trans_t *, int *); #define xfs_trans_commit(tp, flags) _xfs_trans_commit(tp, flags, NULL) void xfs_trans_cancel(xfs_trans_t *, int); -void xfs_trans_ail_init(struct xfs_mount *); -xfs_lsn_t xfs_trans_push_ail(struct xfs_mount *, xfs_lsn_t); +int xfs_trans_ail_init(struct xfs_mount *); +void xfs_trans_ail_destroy(struct xfs_mount *); +void xfs_trans_push_ail(struct xfs_mount *, xfs_lsn_t); xfs_lsn_t xfs_trans_tail_ail(struct xfs_mount *); void xfs_trans_unlocked_item(struct xfs_mount *, xfs_log_item_t *); Index: 2.6.x-xfs-new/fs/xfs/xfs_trans_ail.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_trans_ail.c 2007-11-22 10:25:24.978356509 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_trans_ail.c 2007-11-22 10:34:01.564358689 +1100 @@ -57,7 +57,7 @@ xfs_trans_tail_ail( xfs_log_item_t *lip; spin_lock(&mp->m_ail_lock); - lip = xfs_ail_min(&(mp->m_ail)); + lip = xfs_ail_min(&(mp->m_ail.xa_ail)); if (lip == NULL) { lsn = (xfs_lsn_t)0; } else { @@ -71,119 +71,185 @@ xfs_trans_tail_ail( /* * xfs_trans_push_ail * - * This routine is called to move the tail of the AIL - * forward. It does this by trying to flush items in the AIL - * whose lsns are below the given threshold_lsn. + * This routine is called to move the tail of the AIL forward. It does this by + * trying to flush items in the AIL whose lsns are below the given + * threshold_lsn. * - * The routine returns the lsn of the tail of the log. + * the push is run asynchronously in a separate thread, so we return the tail + * of the log right now instead of the tail after the push. This means we will + * either continue right away, or we will sleep waiting on the async thread to + * do it's work. + * + * We do this unlocked - we only need to know whether there is anything in the + * AIL at the time we are called. We don't need to access the contents of + * any of the objects, so the lock is not needed. */ -xfs_lsn_t +void xfs_trans_push_ail( xfs_mount_t *mp, xfs_lsn_t threshold_lsn) { - xfs_lsn_t lsn; xfs_log_item_t *lip; - int gen; - int restarts; - int lock_result; - int flush_log; -#define XFS_TRANS_PUSH_AIL_RESTARTS 1000 + lip = xfs_ail_min(&mp->m_ail.xa_ail); + if (lip && !XFS_FORCED_SHUTDOWN(mp)) { + if (XFS_LSN_CMP(threshold_lsn, mp->m_ail.xa_target) > 0) + xfsaild_wakeup(mp, threshold_lsn); + } +} + +/* + * Return the item in the AIL with the current lsn. + * Return the current tree generation number for use + * in calls to xfs_trans_next_ail(). + */ +STATIC xfs_log_item_t * +xfs_trans_first_push_ail( + xfs_mount_t *mp, + int *gen, + xfs_lsn_t lsn) +{ + xfs_log_item_t *lip; + + lip = xfs_ail_min(&(mp->m_ail.xa_ail)); + *gen = (int)mp->m_ail.xa_gen; + if (lsn == 0) + return lip; + + while (lip && (XFS_LSN_CMP(lip->li_lsn, lsn) < 0)) + lip = lip->li_ail.ail_forw; + + return lip; +} + +/* + * Function that does the work of pushing on the AIL + */ +long +xfsaild_push( + xfs_mount_t *mp, + xfs_lsn_t *last_lsn) +{ + long tout = 1000; /* milliseconds */ + xfs_lsn_t last_pushed_lsn = *last_lsn; + xfs_lsn_t target = mp->m_ail.xa_target; + xfs_lsn_t lsn; + xfs_log_item_t *lip; + int gen; + int restarts; + int flush_log, count, stuck; + +#define XFS_TRANS_PUSH_AIL_RESTARTS 10 spin_lock(&mp->m_ail_lock); - lip = xfs_trans_first_ail(mp, &gen); - if (lip == NULL || XFS_FORCED_SHUTDOWN(mp)) { + lip = xfs_trans_first_push_ail(mp, &gen, *last_lsn); + if (!lip || XFS_FORCED_SHUTDOWN(mp)) { /* - * Just return if the AIL is empty. + * AIL is empty or our push has reached the end. */ spin_unlock(&mp->m_ail_lock); - return (xfs_lsn_t)0; + last_pushed_lsn = 0; + goto out; } XFS_STATS_INC(xs_push_ail); /* * While the item we are looking at is below the given threshold - * try to flush it out. Make sure to limit the number of times - * we allow xfs_trans_next_ail() to restart scanning from the - * beginning of the list. We'd like not to stop until we've at least + * try to flush it out. We'd like not to stop until we've at least * tried to push on everything in the AIL with an LSN less than - * the given threshold. However, we may give up before that if - * we realize that we've been holding the AIL lock for 'too long', - * blocking interrupts. Currently, too long is < 500us roughly. + * the given threshold. + * + * However, we will stop after a certain number of pushes and wait + * for a reduced timeout to fire before pushing further. This + * prevents use from spinning when we can't do anything or there is + * lots of contention on the AIL lists. */ - flush_log = 0; - restarts = 0; - while (((restarts < XFS_TRANS_PUSH_AIL_RESTARTS) && - (XFS_LSN_CMP(lip->li_lsn, threshold_lsn) < 0))) { + tout = 10; + lsn = lip->li_lsn; + flush_log = stuck = count = restarts = 0; + while ((XFS_LSN_CMP(lip->li_lsn, target) < 0)) { + int lock_result; /* - * If we can lock the item without sleeping, unlock - * the AIL lock and flush the item. Then re-grab the - * AIL lock so we can look for the next item on the - * AIL. Since we unlock the AIL while we flush the - * item, the next routine may start over again at the - * the beginning of the list if anything has changed. - * That is what the generation count is for. + * If we can lock the item without sleeping, unlock the AIL + * lock and flush the item. Then re-grab the AIL lock so we + * can look for the next item on the AIL. List changes are + * handled by the AIL lookup functions internally * - * If we can't lock the item, either its holder will flush - * it or it is already being flushed or it is being relogged. - * In any of these case it is being taken care of and we - * can just skip to the next item in the list. + * If we can't lock the item, either its holder will flush it + * or it is already being flushed or it is being relogged. In + * any of these case it is being taken care of and we can just + * skip to the next item in the list. */ lock_result = IOP_TRYLOCK(lip); + spin_unlock(&mp->m_ail_lock); switch (lock_result) { - case XFS_ITEM_SUCCESS: - spin_unlock(&mp->m_ail_lock); + case XFS_ITEM_SUCCESS: XFS_STATS_INC(xs_push_ail_success); IOP_PUSH(lip); - spin_lock(&mp->m_ail_lock); + last_pushed_lsn = lsn; break; - case XFS_ITEM_PUSHBUF: - spin_unlock(&mp->m_ail_lock); + case XFS_ITEM_PUSHBUF: XFS_STATS_INC(xs_push_ail_pushbuf); -#ifdef XFSRACEDEBUG - delay_for_intr(); - delay(300); -#endif - ASSERT(lip->li_ops->iop_pushbuf); - ASSERT(lip); IOP_PUSHBUF(lip); - spin_lock(&mp->m_ail_lock); + last_pushed_lsn = lsn; break; - case XFS_ITEM_PINNED: + case XFS_ITEM_PINNED: XFS_STATS_INC(xs_push_ail_pinned); + stuck++; flush_log = 1; break; - case XFS_ITEM_LOCKED: + case XFS_ITEM_LOCKED: XFS_STATS_INC(xs_push_ail_locked); + last_pushed_lsn = lsn; + stuck++; break; - case XFS_ITEM_FLUSHING: + case XFS_ITEM_FLUSHING: XFS_STATS_INC(xs_push_ail_flushing); + last_pushed_lsn = lsn; + stuck++; break; - default: + default: ASSERT(0); break; } - lip = xfs_trans_next_ail(mp, lip, &gen, &restarts); - if (lip == NULL) { + spin_lock(&mp->m_ail_lock); + /* should we bother continuing? */ + if (XFS_FORCED_SHUTDOWN(mp)) break; - } - if (XFS_FORCED_SHUTDOWN(mp)) { - /* - * Just return if we shut down during the last try. - */ - spin_unlock(&mp->m_ail_lock); - return (xfs_lsn_t)0; - } + ASSERT(mp->m_log); + + count++; + /* + * Are there too many items we can't do anything with? + * If we we are skipping too many items because we can't flush + * them or they are already being flushed, we back off and + * given them time to complete whatever operation is being + * done. i.e. remove pressure from the AIL while we can't make + * progress so traversals don't slow down further inserts and + * removals to/from the AIL. + * + * The value of 100 is an arbitrary magic number based on + * observation. + */ + if (stuck > 100) + break; + + lip = xfs_trans_next_ail(mp, lip, &gen, &restarts); + if (lip == NULL) + break; + if (restarts > XFS_TRANS_PUSH_AIL_RESTARTS) + break; + lsn = lip->li_lsn; } + spin_unlock(&mp->m_ail_lock); if (flush_log) { /* @@ -191,22 +257,35 @@ xfs_trans_push_ail( * push out the log so it will become unpinned and * move forward in the AIL. */ - spin_unlock(&mp->m_ail_lock); XFS_STATS_INC(xs_push_ail_flush); xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE); - spin_lock(&mp->m_ail_lock); } - lip = xfs_ail_min(&(mp->m_ail)); - if (lip == NULL) { - lsn = (xfs_lsn_t)0; - } else { - lsn = lip->li_lsn; + /* + * We reached the target so wait a bit longer for I/O to complete and + * remove pushed items from the AIL before we start the next scan from + * the start of the AIL. + */ + if ((XFS_LSN_CMP(lsn, target) >= 0)) { + tout += 20; + last_pushed_lsn = 0; + } else if ((restarts > XFS_TRANS_PUSH_AIL_RESTARTS) || + (count && ((stuck * 100) / count > 90))) { + /* + * Either there is a lot of contention on the AIL or we + * are stuck due to operations in progress. "Stuck" in this + * case is defined as >90% of the items we tried to push + * were stuck. + * + * Backoff a bit more to allow some I/O to complete before + * continuing from where we were. + */ + tout += 10; } - - spin_unlock(&mp->m_ail_lock); - return lsn; -} /* xfs_trans_push_ail */ +out: + *last_lsn = last_pushed_lsn; + return tout; +} /* xfsaild_push */ /* @@ -247,7 +326,7 @@ xfs_trans_unlocked_item( * the call to xfs_log_move_tail() doesn't do anything if there's * not enough free space to wake people up so we're safe calling it. */ - min_lip = xfs_ail_min(&mp->m_ail); + min_lip = xfs_ail_min(&mp->m_ail.xa_ail); if (min_lip == lip) xfs_log_move_tail(mp, 1); @@ -279,7 +358,7 @@ xfs_trans_update_ail( xfs_log_item_t *dlip=NULL; xfs_log_item_t *mlip; /* ptr to minimum lip */ - ailp = &(mp->m_ail); + ailp = &(mp->m_ail.xa_ail); mlip = xfs_ail_min(ailp); if (lip->li_flags & XFS_LI_IN_AIL) { @@ -292,10 +371,10 @@ xfs_trans_update_ail( lip->li_lsn = lsn; xfs_ail_insert(ailp, lip); - mp->m_ail_gen++; + mp->m_ail.xa_gen++; if (mlip == dlip) { - mlip = xfs_ail_min(&(mp->m_ail)); + mlip = xfs_ail_min(&(mp->m_ail.xa_ail)); spin_unlock(&mp->m_ail_lock); xfs_log_move_tail(mp, mlip->li_lsn); } else { @@ -330,7 +409,7 @@ xfs_trans_delete_ail( xfs_log_item_t *mlip; if (lip->li_flags & XFS_LI_IN_AIL) { - ailp = &(mp->m_ail); + ailp = &(mp->m_ail.xa_ail); mlip = xfs_ail_min(ailp); dlip = xfs_ail_delete(ailp, lip); ASSERT(dlip == lip); @@ -338,10 +417,10 @@ xfs_trans_delete_ail( lip->li_flags &= ~XFS_LI_IN_AIL; lip->li_lsn = 0; - mp->m_ail_gen++; + mp->m_ail.xa_gen++; if (mlip == dlip) { - mlip = xfs_ail_min(&(mp->m_ail)); + mlip = xfs_ail_min(&(mp->m_ail.xa_ail)); spin_unlock(&mp->m_ail_lock); xfs_log_move_tail(mp, (mlip ? mlip->li_lsn : 0)); } else { @@ -379,10 +458,10 @@ xfs_trans_first_ail( { xfs_log_item_t *lip; - lip = xfs_ail_min(&(mp->m_ail)); - *gen = (int)mp->m_ail_gen; + lip = xfs_ail_min(&(mp->m_ail.xa_ail)); + *gen = (int)mp->m_ail.xa_gen; - return (lip); + return lip; } /* @@ -402,11 +481,11 @@ xfs_trans_next_ail( xfs_log_item_t *nlip; ASSERT(mp && lip && gen); - if (mp->m_ail_gen == *gen) { - nlip = xfs_ail_next(&(mp->m_ail), lip); + if (mp->m_ail.xa_gen == *gen) { + nlip = xfs_ail_next(&(mp->m_ail.xa_ail), lip); } else { - nlip = xfs_ail_min(&(mp->m_ail)); - *gen = (int)mp->m_ail_gen; + nlip = xfs_ail_min(&(mp->m_ail).xa_ail); + *gen = (int)mp->m_ail.xa_gen; if (restarts != NULL) { XFS_STATS_INC(xs_push_ail_restarts); (*restarts)++; @@ -431,12 +510,20 @@ xfs_trans_next_ail( /* * Initialize the doubly linked list to point only to itself. */ -void +int xfs_trans_ail_init( xfs_mount_t *mp) { - mp->m_ail.ail_forw = (xfs_log_item_t*)&(mp->m_ail); - mp->m_ail.ail_back = (xfs_log_item_t*)&(mp->m_ail); + mp->m_ail.xa_ail.ail_forw = (xfs_log_item_t*)&mp->m_ail.xa_ail; + mp->m_ail.xa_ail.ail_back = (xfs_log_item_t*)&mp->m_ail.xa_ail; + return xfsaild_start(mp); +} + +void +xfs_trans_ail_destroy( + xfs_mount_t *mp) +{ + xfsaild_stop(mp); } /* Index: 2.6.x-xfs-new/fs/xfs/xfs_trans_priv.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_trans_priv.h 2007-11-22 10:25:24.982355999 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_trans_priv.h 2007-11-22 10:34:01.568358178 +1100 @@ -57,4 +57,12 @@ struct xfs_log_item *xfs_trans_next_ail( struct xfs_log_item *, int *, int *); +/* + * AIL push thread support + */ +long xfsaild_push(struct xfs_mount *, xfs_lsn_t *); +void xfsaild_wakeup(struct xfs_mount *, xfs_lsn_t); +int xfsaild_start(struct xfs_mount *); +void xfsaild_stop(struct xfs_mount *); + #endif /* __XFS_TRANS_PRIV_H__ */ Index: 2.6.x-xfs-new/fs/xfs/xfsidbg.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfsidbg.c 2007-11-22 10:33:54.001325501 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfsidbg.c 2007-11-22 10:34:01.572357667 +1100 @@ -6220,13 +6220,13 @@ xfsidbg_xaildump(xfs_mount_t *mp) }; int count; - if ((mp->m_ail.ail_forw == NULL) || - (mp->m_ail.ail_forw == (xfs_log_item_t *)&mp->m_ail)) { + if ((mp->m_ail.xa_ail.ail_forw == NULL) || + (mp->m_ail.xa_ail.ail_forw == (xfs_log_item_t *)&mp->m_ail.xa_ail)) { kdb_printf("AIL is empty\n"); return; } kdb_printf("AIL for mp 0x%p, oldest first\n", mp); - lip = (xfs_log_item_t*)mp->m_ail.ail_forw; + lip = (xfs_log_item_t*)mp->m_ail.xa_ail.ail_forw; for (count = 0; lip; count++) { kdb_printf("[%d] type %s ", count, xfsidbg_item_type_str(lip)); printflags((uint)(lip->li_flags), li_flags, "flags:"); @@ -6255,7 +6255,7 @@ xfsidbg_xaildump(xfs_mount_t *mp) break; } - if (lip->li_ail.ail_forw == (xfs_log_item_t*)&mp->m_ail) { + if (lip->li_ail.ail_forw == (xfs_log_item_t*)&mp->m_ail.xa_ail) { lip = NULL; } else { lip = lip->li_ail.ail_forw; @@ -6312,9 +6312,9 @@ xfsidbg_xmount(xfs_mount_t *mp) kdb_printf("xfs_mount at 0x%p\n", mp); kdb_printf("tid 0x%x ail_lock 0x%p &ail 0x%p\n", - mp->m_tid, &mp->m_ail_lock, &mp->m_ail); + mp->m_tid, &mp->m_ail_lock, &mp->m_ail.xa_ail); kdb_printf("ail_gen 0x%x &sb 0x%p\n", - mp->m_ail_gen, &mp->m_sb); + mp->m_ail.xa_gen, &mp->m_sb); kdb_printf("sb_lock 0x%p sb_bp 0x%p dev 0x%x logdev 0x%x rtdev 0x%x\n", &mp->m_sb_lock, mp->m_sb_bp, mp->m_ddev_targp ? mp->m_ddev_targp->bt_dev : 0, From owner-xfs@oss.sgi.com Wed Nov 21 16:49:23 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 16:49:28 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-7.3 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.0-r574664 Received: from mx1.suse.de (ns1.suse.de [195.135.220.2]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAM0nInH016510 for ; Wed, 21 Nov 2007 16:49:23 -0800 Received: from Relay1.suse.de (mail2.suse.de [195.135.221.8]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.suse.de (Postfix) with ESMTP id B90E220BD4; Thu, 22 Nov 2007 01:49:25 +0100 (CET) To: David Chinner Cc: xfs-oss , lkml Subject: Re: [PATCH 2/9]: Reduce Log I/O latency From: Andi Kleen References: <20071122003339.GH114266761__34694.2978365861$1195691722$gmane$org@sgi.com> Date: Thu, 22 Nov 2007 01:49:25 +0100 In-Reply-To: <20071122003339.GH114266761__34694.2978365861$1195691722$gmane$org@sgi.com> (David Chinner's message of "Thu\, 22 Nov 2007 11\:33\:39 +1100") Message-ID: User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Virus-Scanned: ClamAV 0.91.2/4874/Wed Nov 21 12:33:19 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13736 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: andi@firstfloor.org Precedence: bulk X-list: xfs David Chinner writes: > To ensure that log I/O is issued as the highest priority I/O, set > the I/O priority of the log I/O to the highest possible. This will > ensure that log I/O is not held up behind bulk data or other > metadata I/O as delaying log I/O can pause the entire transaction > subsystem. Introduce a new buffer flag to allow us to tag the log > buffers so we can discrimiate when issuing the I/O. Won't that possible disturb other RT priority users that do not need log IO (e.g. working on preallocated files)? Seems a little dangerous. I suspect you want a "higher than bulk but lower than RT" priority for this really unless there is any block RT priority task waiting for log IO (but keeping track of the later might be tricky) -Andi From owner-xfs@oss.sgi.com Wed Nov 21 16:50:03 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 16:50:06 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_63 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAM0nwGX016746 for ; Wed, 21 Nov 2007 16:50:01 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA14989; Thu, 22 Nov 2007 11:50:05 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAM0o4dD115562245; Thu, 22 Nov 2007 11:50:04 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAM0o3ZB115419998; Thu, 22 Nov 2007 11:50:03 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 22 Nov 2007 11:50:03 +1100 From: David Chinner To: xfs-oss Cc: xfs-dev Subject: [PATCH 2/2] Debug - don't exhaustively check the AIL on every operation Message-ID: <20071122005003.GQ114266761@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4874/Wed Nov 21 12:33:19 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13737 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Checking the entire AIL on every insert and remove is prohibitively expensive - the sustained sequntial create rate on a single disk drops from about 1800/s to 60/s because of this checking resulting in the xfslogd becoming cpu bound. By default on debug builds, only check the next and previous entries in the list to ensure they are ordered correctly. If you really want, define XFS_TRANS_DEBUG to use the old behaviour. Signed-off-by: Dave Chinner --- fs/xfs/xfs_trans_ail.c | 37 ++++++++++++++++++++++++++++--------- 1 file changed, 28 insertions(+), 9 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/xfs_trans_ail.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_trans_ail.c 2007-11-22 10:34:01.564358689 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_trans_ail.c 2007-11-22 10:34:03.320134239 +1100 @@ -34,9 +34,9 @@ STATIC xfs_log_item_t * xfs_ail_min(xfs_ STATIC xfs_log_item_t * xfs_ail_next(xfs_ail_entry_t *, xfs_log_item_t *); #ifdef DEBUG -STATIC void xfs_ail_check(xfs_ail_entry_t *); +STATIC void xfs_ail_check(xfs_ail_entry_t *, xfs_log_item_t *); #else -#define xfs_ail_check(a) +#define xfs_ail_check(a,l) #endif /* DEBUG */ @@ -563,7 +563,7 @@ xfs_ail_insert( next_lip->li_ail.ail_forw = lip; lip->li_ail.ail_forw->li_ail.ail_back = lip; - xfs_ail_check(base); + xfs_ail_check(base, lip); return; } @@ -577,12 +577,12 @@ xfs_ail_delete( xfs_log_item_t *lip) /* ARGSUSED */ { + xfs_ail_check(base, lip); lip->li_ail.ail_forw->li_ail.ail_back = lip->li_ail.ail_back; lip->li_ail.ail_back->li_ail.ail_forw = lip->li_ail.ail_forw; lip->li_ail.ail_forw = NULL; lip->li_ail.ail_back = NULL; - xfs_ail_check(base); return lip; } @@ -626,13 +626,13 @@ xfs_ail_next( */ STATIC void xfs_ail_check( - xfs_ail_entry_t *base) + xfs_ail_entry_t *base, + xfs_log_item_t *lip) { - xfs_log_item_t *lip; xfs_log_item_t *prev_lip; - lip = base->ail_forw; - if (lip == (xfs_log_item_t*)base) { + prev_lip = base->ail_forw; + if (prev_lip == (xfs_log_item_t*)base) { /* * Make sure the pointers are correct when the list * is empty. @@ -642,9 +642,27 @@ xfs_ail_check( } /* + * Check the next and previous entries are valid. + */ + ASSERT((lip->li_flags & XFS_LI_IN_AIL) != 0); + prev_lip = lip->li_ail.ail_back; + if (prev_lip != (xfs_log_item_t*)base) { + ASSERT(prev_lip->li_ail.ail_forw == lip); + ASSERT(XFS_LSN_CMP(prev_lip->li_lsn, lip->li_lsn) <= 0); + } + prev_lip = lip->li_ail.ail_forw; + if (prev_lip != (xfs_log_item_t*)base) { + ASSERT(prev_lip->li_ail.ail_back == lip); + ASSERT(XFS_LSN_CMP(prev_lip->li_lsn, lip->li_lsn) >= 0); + } + + +#ifdef XFS_TRANS_DEBUG + /* * Walk the list checking forward and backward pointers, * lsn ordering, and that every entry has the XFS_LI_IN_AIL - * flag set. + * flag set. This is really expensive, so only do it when + * specifically debugging the transaction subsystem. */ prev_lip = (xfs_log_item_t*)base; while (lip != (xfs_log_item_t*)base) { @@ -659,5 +677,6 @@ xfs_ail_check( } ASSERT(lip == (xfs_log_item_t*)base); ASSERT(base->ail_back == prev_lip); +#endif /* XFS_TRANS_DEBUG */ } #endif /* DEBUG */ From owner-xfs@oss.sgi.com Wed Nov 21 17:12:28 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 17:12:31 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAM1CI51019360 for ; Wed, 21 Nov 2007 17:12:25 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA15539; Thu, 22 Nov 2007 12:12:19 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAM1CIdD114978132; Thu, 22 Nov 2007 12:12:19 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAM1CEcc115507960; Thu, 22 Nov 2007 12:12:14 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 22 Nov 2007 12:12:14 +1100 From: David Chinner To: Andi Kleen Cc: David Chinner , xfs-oss , lkml Subject: Re: [PATCH 2/9]: Reduce Log I/O latency Message-ID: <20071122011214.GR114266761@sgi.com> References: <20071122003339.GH114266761__34694.2978365861$1195691722$gmane$org@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4875/Wed Nov 21 16:08:18 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13738 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Thu, Nov 22, 2007 at 01:49:25AM +0100, Andi Kleen wrote: > David Chinner writes: > > > To ensure that log I/O is issued as the highest priority I/O, set > > the I/O priority of the log I/O to the highest possible. This will > > ensure that log I/O is not held up behind bulk data or other > > metadata I/O as delaying log I/O can pause the entire transaction > > subsystem. Introduce a new buffer flag to allow us to tag the log > > buffers so we can discrimiate when issuing the I/O. > > Won't that possible disturb other RT priority users that do not need > log IO (e.g. working on preallocated files)? Seems a little > dangerous. In all the cases that I know of where ppl are using what could be considered real-time I/O (e.g. media environments where they do real-time ingest and playout from the same filesystem) the real-time ingest processes create the files and do pre-allocation before doing their I/O. This I/O can get held up behind another process that is not real time that has issued log I/O. Given there is no I/O priority inheritence and having log I/O stall will stall the entire filesystem, we cannot allow log I/O to stall in real-time environments. Hence it must have the highest possible priority to prevent this. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Wed Nov 21 19:11:30 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 19:11:34 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.0-r574664 Received: from waste.org (waste.org [66.93.16.53]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAM3BRHh002066 for ; Wed, 21 Nov 2007 19:11:29 -0800 Received: from waste.org (localhost [127.0.0.1]) by waste.org (8.13.8/8.13.8/Debian-3) with ESMTP id lAM2vTY6017596 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Wed, 21 Nov 2007 20:57:29 -0600 Received: (from oxymoron@localhost) by waste.org (8.13.8/8.13.8/Submit) id lAM2vRiW017590; Wed, 21 Nov 2007 20:57:27 -0600 Date: Wed, 21 Nov 2007 20:57:27 -0600 From: Matt Mackall To: David Chinner Cc: Andi Kleen , xfs-oss , lkml Subject: Re: [PATCH 2/9]: Reduce Log I/O latency Message-ID: <20071122025726.GG17536@waste.org> References: <20071122003339.GH114266761__34694.2978365861$1195691722$gmane$org@sgi.com> <20071122011214.GR114266761@sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071122011214.GR114266761@sgi.com> User-Agent: Mutt/1.5.13 (2006-08-11) X-Virus-Scanned: ClamAV 0.91.2/4876/Wed Nov 21 17:22:57 2007 on oss.sgi.com X-Virus-Scanned: by amavisd-new X-Virus-Status: Clean X-archive-position: 13739 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: mpm@selenic.com Precedence: bulk X-list: xfs On Thu, Nov 22, 2007 at 12:12:14PM +1100, David Chinner wrote: > On Thu, Nov 22, 2007 at 01:49:25AM +0100, Andi Kleen wrote: > > David Chinner writes: > > > > > To ensure that log I/O is issued as the highest priority I/O, set > > > the I/O priority of the log I/O to the highest possible. This will > > > ensure that log I/O is not held up behind bulk data or other > > > metadata I/O as delaying log I/O can pause the entire transaction > > > subsystem. Introduce a new buffer flag to allow us to tag the log > > > buffers so we can discrimiate when issuing the I/O. > > > > Won't that possible disturb other RT priority users that do not need > > log IO (e.g. working on preallocated files)? Seems a little > > dangerous. > > In all the cases that I know of where ppl are using what could > be considered real-time I/O (e.g. media environments where they > do real-time ingest and playout from the same filesystem) the > real-time ingest processes create the files and do pre-allocation > before doing their I/O. This I/O can get held up behind another > process that is not real time that has issued log I/O. > > Given there is no I/O priority inheritence and having log I/O stall > will stall the entire filesystem, we cannot allow log I/O to > stall in real-time environments. Hence it must have the highest > possible priority to prevent this. I've seen PVRs that would be upset by this. They put media on one filesystem and database/apps/swap/etc. on another, but have everything on a single spindle. Stalling a media filesystem read for a write anywhere else = fail. -- Mathematics is the supreme nostalgia of our time. From owner-xfs@oss.sgi.com Wed Nov 21 19:41:22 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 19:41:27 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAM3fGdI006551 for ; Wed, 21 Nov 2007 19:41:21 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA19160; Thu, 22 Nov 2007 14:41:17 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAM3fCdD115512232; Thu, 22 Nov 2007 14:41:13 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAM3f6pk97486193; Thu, 22 Nov 2007 14:41:06 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 22 Nov 2007 14:41:06 +1100 From: David Chinner To: Matt Mackall Cc: David Chinner , Andi Kleen , xfs-oss , lkml Subject: Re: [PATCH 2/9]: Reduce Log I/O latency Message-ID: <20071122034106.GV114266761@sgi.com> References: <20071122003339.GH114266761__34694.2978365861$1195691722$gmane$org@sgi.com> <20071122011214.GR114266761@sgi.com> <20071122025726.GG17536@waste.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071122025726.GG17536@waste.org> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4876/Wed Nov 21 17:22:57 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13740 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Wed, Nov 21, 2007 at 08:57:27PM -0600, Matt Mackall wrote: > On Thu, Nov 22, 2007 at 12:12:14PM +1100, David Chinner wrote: > > In all the cases that I know of where ppl are using what could > > be considered real-time I/O (e.g. media environments where they > > do real-time ingest and playout from the same filesystem) the > > real-time ingest processes create the files and do pre-allocation > > before doing their I/O. This I/O can get held up behind another > > process that is not real time that has issued log I/O. > > > > Given there is no I/O priority inheritence and having log I/O stall > > will stall the entire filesystem, we cannot allow log I/O to > > stall in real-time environments. Hence it must have the highest > > possible priority to prevent this. > > I've seen PVRs that would be upset by this. They put media on one > filesystem and database/apps/swap/etc. on another, but have everything > on a single spindle. Stalling a media filesystem read for a write > anywhere else = fail. Sounds like the PVR is badly designed to me. If a write can cause a read to miss a playback deadline, then you haven't built enough buffering into your playback application. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Wed Nov 21 19:57:15 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 19:57:22 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED autolearn=ham version=3.3.0-r574664 Received: from mailgate.mysql.com (mailgate-out2.mysql.com [213.136.52.68]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAM3vDBm008771 for ; Wed, 21 Nov 2007 19:57:14 -0800 Received: from localhost (localhost.localdomain [127.0.0.1]) by mailgate.mysql.com (8.13.8/8.13.8) with ESMTP id lAM3RxAI017831; Thu, 22 Nov 2007 04:27:59 +0100 Received: from mail.mysql.com ([10.222.1.99]) by localhost (mailgate.mysql.com [10.222.1.98]) (amavisd-new, port 10026) with LMTP id 08583-09; Thu, 22 Nov 2007 04:27:59 +0100 (CET) Received: from [192.168.100.109] (b2CBD.static.pacific.net.au [203.100.241.189]) (authenticated bits=0) by mail.mysql.com (8.13.3/8.13.3) with ESMTP id lAM3RpuB015326 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 22 Nov 2007 04:27:55 +0100 Subject: Re: [PATCH 2/9]: Reduce Log I/O latency From: Stewart Smith To: David Chinner Cc: Andi Kleen , xfs-oss , lkml In-Reply-To: <20071122011214.GR114266761@sgi.com> References: <20071122003339.GH114266761__34694.2978365861$1195691722$gmane$org@sgi.com> <20071122011214.GR114266761@sgi.com> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-WAjZXbGIUan1gQIlUjdm" Organization: MySQL AB Date: Thu, 22 Nov 2007 14:28:43 +1100 Message-Id: <1195702123.8369.78.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.12.1 X-Virus-Scanned: ClamAV 0.91.2/4876/Wed Nov 21 17:22:57 2007 on oss.sgi.com X-Virus-Scanned: by amavisd-new at mailgate.mysql.com X-Virus-Status: Clean X-archive-position: 13741 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: stewart@mysql.com Precedence: bulk X-list: xfs --=-WAjZXbGIUan1gQIlUjdm Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Thu, 2007-11-22 at 12:12 +1100, David Chinner wrote: > In all the cases that I know of where ppl are using what could > be considered real-time I/O (e.g. media environments where they > do real-time ingest and playout from the same filesystem) the > real-time ingest processes create the files and do pre-allocation > before doing their I/O. This I/O can get held up behind another > process that is not real time that has issued log I/O.=20 >=20 > Given there is no I/O priority inheritence and having log I/O stall > will stall the entire filesystem, we cannot allow log I/O to > stall in real-time environments. Hence it must have the highest > possible priority to prevent this. FWIW from a "real time" database POV this seems to make sense to me... in fact, we probably rely on filesystem metadata way too much (historically it's just "worked".... although we do seem to get issues on ext3). I have a (casually stupid) simulation program... although I've observed little to no problems on all my XFS tests using it. --=20 Stewart Smith, Senior Software Engineer (MySQL Cluster) MySQL AB, www.mysql.com Office: +14082136540 Ext: 6616 VoIP: 6616@sip.us.mysql.com Mobile: +61 4 3 8844 332 --=-WAjZXbGIUan1gQIlUjdm Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iQIVAwUAR0T3a73yNwHyU3DLAQLp7Q/+PaPp6lTGjHwPBvP8RvoPzEvajjlj9JhM 5jWPDFv8xJcK2zIMCSvzV7gV4FLVTLT6mX4ZOW3dQ7aItbGEM0KzT9P/T82pjDyf T9j2HRuN45bVEMGuqpkyuwCKuLK+XJv+nKio6+5pZgMpXulWM1xY1pPDaXW/OkP4 ZwgU3CkYkBC0u7bTmCb7fndFmmOtXGXox/4GSTU+J1Ez6SPblcTt66EThCD1ad7d UfsF08LaEnUNoDfWDNSd2WxnD6A3p4EL3KoUla2lBk7a7FyOOV04zeK9LulinqpA KceXVgy2J8BUyFeZdlCI02J8QhMJXpG2qgCjpRlmbjZT+dLLCvES2O6I6NqqCNW+ Z/H00c9TWAhpjuWqga5wz0F0xROGec/Nn5rs/3XKz13HmaKny32Dyv2xm9t/1qFr 4GSkHWQtJrovJwA+A6pBIzIJJ4EUbnVanu4pHZ0gL925dqQRCl/49/GsiGuMBQme gk4Izdf+2sAdQF7lnPAquQXu5g9U19zhLog25jVLr5R4H8gY1hpqKWO3ftC4UOsW fFjV2aGI+CMb9Fg2lhkgeHyvJFy4Rx+5Luh4OkWDdz9aqknJgR+tthGN5LlWUCZJ V3HOOHNJrsKIG/0yZ6u/ek++3/+gIuNn1ZGTLQHgvqGGKmrt0GjR0nagNeB6JdtH uD3ce+3Rq7o= =KdGI -----END PGP SIGNATURE----- --=-WAjZXbGIUan1gQIlUjdm-- From owner-xfs@oss.sgi.com Wed Nov 21 23:25:57 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 21 Nov 2007 23:27:09 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.2 required=5.0 tests=AWL,BAYES_00, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.0-r574664 Received: from waste.org (waste.org [66.93.16.53]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAM7PrxH003055 for ; Wed, 21 Nov 2007 23:25:57 -0800 Received: from waste.org (localhost [127.0.0.1]) by waste.org (8.13.8/8.13.8/Debian-3) with ESMTP id lAM7PpJc020266 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Thu, 22 Nov 2007 01:25:51 -0600 Received: (from oxymoron@localhost) by waste.org (8.13.8/8.13.8/Submit) id lAM7Pncw020260; Thu, 22 Nov 2007 01:25:49 -0600 Date: Thu, 22 Nov 2007 01:25:49 -0600 From: Matt Mackall To: David Chinner Cc: Andi Kleen , xfs-oss , lkml Subject: Re: [PATCH 2/9]: Reduce Log I/O latency Message-ID: <20071122072549.GQ19691@waste.org> References: <20071122003339.GH114266761__34694.2978365861$1195691722$gmane$org@sgi.com> <20071122011214.GR114266761@sgi.com> <20071122025726.GG17536@waste.org> <20071122034106.GV114266761@sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071122034106.GV114266761@sgi.com> User-Agent: Mutt/1.5.13 (2006-08-11) X-Virus-Scanned: ClamAV 0.91.2/4877/Wed Nov 21 19:03:10 2007 on oss.sgi.com X-Virus-Scanned: by amavisd-new X-Virus-Status: Clean X-archive-position: 13742 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: mpm@selenic.com Precedence: bulk X-list: xfs On Thu, Nov 22, 2007 at 02:41:06PM +1100, David Chinner wrote: > On Wed, Nov 21, 2007 at 08:57:27PM -0600, Matt Mackall wrote: > > On Thu, Nov 22, 2007 at 12:12:14PM +1100, David Chinner wrote: > > > In all the cases that I know of where ppl are using what could > > > be considered real-time I/O (e.g. media environments where they > > > do real-time ingest and playout from the same filesystem) the > > > real-time ingest processes create the files and do pre-allocation > > > before doing their I/O. This I/O can get held up behind another > > > process that is not real time that has issued log I/O. > > > > > > Given there is no I/O priority inheritence and having log I/O stall > > > will stall the entire filesystem, we cannot allow log I/O to > > > stall in real-time environments. Hence it must have the highest > > > possible priority to prevent this. > > > > I've seen PVRs that would be upset by this. They put media on one > > filesystem and database/apps/swap/etc. on another, but have everything > > on a single spindle. Stalling a media filesystem read for a write > > anywhere else = fail. > > Sounds like the PVR is badly designed to me. If a write can cause a > read to miss a playback deadline, then you haven't built enough > buffering into your playback application. Normally it's not a problem. But your proposed change can push a working system into a non-working system by making non-critical I/O on an unrelated filesystem have higher priority than the thing that -actually has real-time constraints-. In other words, I/O priority is per-spindle and not per-filesystem and thus this change has consequences that leak outside the filesystem in question. That's bad. I'd further add that the kernel internals probably shouldn't wander into RT priority levels unless it's actually doing priority inheritance, otherwise it's quite likely to upset the careful considerations of the RT system designer's priority schemes. For instance, a log-heavy but otherwise non-RT load with this patch could possibly completely starve direct I/O to another partition even though it's marked RT, thus livelocking the system. To the general PVR problem: they typically want to work with a minimum of buffering to maximize responsiveness to user commands (fast forward, jump 30 seconds, play in reverse). Now consider that you're recording and playing back multiple HD streams on low-margin set-top hardware and you'll see that making this work -at all- means lots of I/O tuning. -- Mathematics is the supreme nostalgia of our time. From owner-xfs@oss.sgi.com Thu Nov 22 02:32:23 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 22 Nov 2007 02:32:26 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAMAWGuT015153 for ; Thu, 22 Nov 2007 02:32:21 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id VAA26856; Thu, 22 Nov 2007 21:32:07 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAMAW5dD115740950; Thu, 22 Nov 2007 21:32:06 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAMAVxjk115622487; Thu, 22 Nov 2007 21:31:59 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Thu, 22 Nov 2007 21:31:59 +1100 From: David Chinner To: Matt Mackall Cc: David Chinner , Andi Kleen , xfs-oss , lkml Subject: Re: [PATCH 2/9]: Reduce Log I/O latency Message-ID: <20071122103159.GW114266761@sgi.com> References: <20071122003339.GH114266761__34694.2978365861$1195691722$gmane$org@sgi.com> <20071122011214.GR114266761@sgi.com> <20071122025726.GG17536@waste.org> <20071122034106.GV114266761@sgi.com> <20071122072549.GQ19691@waste.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071122072549.GQ19691@waste.org> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4877/Wed Nov 21 19:03:10 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13743 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Thu, Nov 22, 2007 at 01:25:49AM -0600, Matt Mackall wrote: > On Thu, Nov 22, 2007 at 02:41:06PM +1100, David Chinner wrote: > > On Wed, Nov 21, 2007 at 08:57:27PM -0600, Matt Mackall wrote: > > > On Thu, Nov 22, 2007 at 12:12:14PM +1100, David Chinner wrote: > > > > In all the cases that I know of where ppl are using what could > > > > be considered real-time I/O (e.g. media environments where they > > > > do real-time ingest and playout from the same filesystem) the > > > > real-time ingest processes create the files and do pre-allocation > > > > before doing their I/O. This I/O can get held up behind another > > > > process that is not real time that has issued log I/O. > > > > > > > > Given there is no I/O priority inheritence and having log I/O stall > > > > will stall the entire filesystem, we cannot allow log I/O to > > > > stall in real-time environments. Hence it must have the highest > > > > possible priority to prevent this. > > > > > > I've seen PVRs that would be upset by this. They put media on one > > > filesystem and database/apps/swap/etc. on another, but have everything > > > on a single spindle. Stalling a media filesystem read for a write > > > anywhere else = fail. > > > > Sounds like the PVR is badly designed to me. If a write can cause a > > read to miss a playback deadline, then you haven't built enough > > buffering into your playback application. > > Normally it's not a problem. But your proposed change can push a > working system into a non-working system by making non-critical I/O on > an unrelated filesystem have higher priority than the thing that -actually > has real-time constraints-. > > In other words, I/O priority is per-spindle and not per-filesystem and > thus this change has consequences that leak outside the filesystem in > question. That's bad. This has nothing to do with this patch - it's a problem with sharing a single resource in a RT system between two non-deterministic constructs. e.g. I can put two ext3 filesystems on the one spindle, run two completely independent RT workloads on the different filesystems and have one workload DOS the other due to differences in priority at the spindle. That's not a bug in ext3 or the I/O priority mechanism - that's bad system design. Put the filesystems on different spindles and the problem goes away. > I'd further add that the kernel internals probably shouldn't wander > into RT priority levels unless it's actually doing priority > inheritance, otherwise it's quite likely to upset the careful > considerations of the RT system designer's priority schemes. Even if issuing RT I/O will guarantee problems in a RT system? We've put this cool RT I/O prioritisation mechanism in the I/O layer without any consideration of what it means for the filesystems that the I/O must pass through first. The design defines I/O prioritsation from a *process* POV and it ignores the fact that the filesystem might not work effectively under such prioritisation mechanism. An example, perhaps. If you're smart about the way your application does its multi-stream RT write I/O you preallocate the space and use direct I/O. But even though you've preallocated the space, in XFS you still need transactions to work because you have to mark the extent you just wrote to as written. This conversion happens during I/O completion (i.e. in a workqueue) so it doesn't have the *process* priority to force out log I/O at the same priority as the RT thread. Hence once all the log buffers are queued for I/O, the transaction system blocks all the I/O completion workqueues and all the RT write I/O stops completing and your application, which is doing synchronous direct I/O into preallocated regions hangs..... Hence the only way to give the log I/O enough priority to be issued is to give the I/O a higher priority than anything that is running at the time. This is not a problem the I/O scheduler can solve - it is a result of the mechanism used to transfer priority from process context to I/o context. The needs of the filesystem is the key thing that is missing here - you can't do RT I/O if the filesystem backs up.... > Now consider that you're > recording and playing back multiple HD streams on low-margin set-top > hardware and you'll see that making this work -at all- means lots of > I/O tuning. Yes, it does. But along the same lines, sustaining multiple uncompressed 2k and 4k streams (i.e. multiple GB/s of throughput) takes a lot of I/O tuning. We had to design a whole new allocator to tune the I/O patterns to make it work.... Basically, we're not optimising XFS for small, embedded systems. We are at the other end of the scale - XFS is optimised for very large, very expensive storage subsystems and hence we often do things that don't make sense for embedded systems... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Thu Nov 22 04:06:07 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 22 Nov 2007 04:06:11 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from one.firstfloor.org (one.firstfloor.org [213.235.205.2]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAMC64Hh029699 for ; Thu, 22 Nov 2007 04:06:07 -0800 Received: by one.firstfloor.org (Postfix, from userid 503) id 4B47518902A8; Thu, 22 Nov 2007 13:06:11 +0100 (CET) Date: Thu, 22 Nov 2007 13:06:11 +0100 From: Andi Kleen To: Stewart Smith Cc: David Chinner , Andi Kleen , xfs-oss , lkml Subject: Re: [PATCH 2/9]: Reduce Log I/O latency Message-ID: <20071122120611.GA3573@one.firstfloor.org> References: <20071122003339.GH114266761__34694.2978365861$1195691722$gmane$org@sgi.com> <20071122011214.GR114266761@sgi.com> <1195702123.8369.78.camel@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1195702123.8369.78.camel@localhost.localdomain> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4883/Thu Nov 22 01:20:36 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13744 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: andi@firstfloor.org Precedence: bulk X-list: xfs > FWIW from a "real time" database POV this seems to make sense to me... > in fact, we probably rely on filesystem metadata way too much > (historically it's just "worked".... although we do seem to get issues > on ext3). For that case you really would need priority inheritance: any metadata IO on behalf or blocking a process needs to use the process' block IO priority. David's change just fixes a limited set of cases, but breaks others. -Andi From owner-xfs@oss.sgi.com Thu Nov 22 05:15:54 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 22 Nov 2007 05:15:59 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAMDFm3f009971 for ; Thu, 22 Nov 2007 05:15:52 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id AAA00273; Fri, 23 Nov 2007 00:15:48 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAMDFidD115863366; Fri, 23 Nov 2007 00:15:46 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAMDFd8l115891233; Fri, 23 Nov 2007 00:15:39 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Fri, 23 Nov 2007 00:15:39 +1100 From: David Chinner To: Andi Kleen Cc: Stewart Smith , David Chinner , xfs-oss , lkml Subject: Re: [PATCH 2/9]: Reduce Log I/O latency Message-ID: <20071122131539.GX114266761@sgi.com> References: <20071122003339.GH114266761__34694.2978365861$1195691722$gmane$org@sgi.com> <20071122011214.GR114266761@sgi.com> <1195702123.8369.78.camel@localhost.localdomain> <20071122120611.GA3573@one.firstfloor.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071122120611.GA3573@one.firstfloor.org> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4883/Thu Nov 22 01:20:36 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13745 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Thu, Nov 22, 2007 at 01:06:11PM +0100, Andi Kleen wrote: > > FWIW from a "real time" database POV this seems to make sense to me... > > in fact, we probably rely on filesystem metadata way too much > > (historically it's just "worked".... although we do seem to get issues > > on ext3). > > For that case you really would need priority inheritance: any metadata > IO on behalf or blocking a process needs to use the process' block IO > priority. How do you do that when the processes are blocking on semaphores, mutexes or rw-semaphores in the fileysystem three layers removed from the I/O in progress? e.g. a low priority process transaction is holding the AGF buffer locked but the transaction is blocked waiting for some other metadata I/O it has issued needed in the transaction. That metadata I/O is being held out by a higher priority process doing lots of I/O. Another process at the same priority creates a file, requiring inodes to be allocated so it locks the directory into the transaction and later blocks on the AGF buffer semaphore trying to allocate space for the new inode. A very high priority process now comes along and tries to read the directory locked in the create transaction, and blocks on the directory inode ilock because it's already held in write mode. That's three processes all blocked on locks unrelated to the I/O that is being held out, and there is no direct connection that can be used to pass the priority down to the blocked I/O that is causing all the problems..... It's a Bad Idea. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Thu Nov 22 07:02:18 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 22 Nov 2007 07:02:23 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=BAYES_50 autolearn=ham version=3.3.0-r574664 Received: from c2beaimr04.btconnect.com (c2beaimr04.btconnect.com [213.123.26.158]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAMF2HYe022182 for ; Thu, 22 Nov 2007 07:02:18 -0800 Received: from localhost (localhost) by c2beaimr04.btconnect.com with internal id ALT91853; Thu, 22 Nov 2007 14:51:11 GMT Date: Thu, 22 Nov 2007 14:51:11 GMT From: Mail Delivery Subsystem Message-Id: <200711221451.ALT91853@c2beaimr04.btconnect.com> To: linux-xfs@oss.sgi.com MIME-Version: 1.0 Content-Type: multipart/report; report-type=delivery-status; boundary="ALT91853.1195743071/c2beaimr04.btconnect.com" Subject: Warning: could not send message for past 4 hours Auto-Submitted: auto-generated (warning-timeout) X-DSN-Junkmail-Status: score=10/50, host=c2beaimr04.btconnect.com X-DSN-Mirapoint-Virus: VIRUSDELETED; host=c2beaimr04.btconnect.com; attachment=[2.2]; virus=W32/MyDoom-O X-Virus-Scanned: ClamAV 0.91.2/4883/Thu Nov 22 01:20:36 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13746 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: MAILER-DAEMON@c2beaimr04.btconnect.com Precedence: bulk X-list: xfs This is a MIME-encapsulated message --ALT91853.1195743071/c2beaimr04.btconnect.com ********************************************** ** THIS IS A WARNING MESSAGE ONLY ** ** YOU DO NOT NEED TO RESEND YOUR MESSAGE ** ********************************************** The original message was received at Thu, 22 Nov 2007 10:42:41 GMT from ip-77-242-30-138.net.abissnet.al [77.242.30.138] (may be forged) ----- The following addresses had transient delivery errors ----- ----- Transcript of session is unavailable ----- --ALT91853.1195743071/c2beaimr04.btconnect.com Content-Type: message/delivery-status Reporting-MTA: dns; c2beaimr04.btconnect.com Arrival-Date: Thu, 22 Nov 2007 10:42:41 GMT Final-Recipient: RFC822; richard@officeequipmentuk.co.uk Action: delayed Status: 4.4.1 Remote-MTA: DNS; officeequipmentuk.co.uk Last-Attempt-Date: Thu, 22 Nov 2007 14:51:11 GMT Will-Retry-Until: Fri, 23 Nov 2007 10:42:41 GMT --ALT91853.1195743071/c2beaimr04.btconnect.com Content-Type: message/rfc822 Received: from oss.sgi.com (ip-77-242-30-138.net.abissnet.al [77.242.30.138] (may be forged)) by c2beaimr04.btconnect.com with ESMTP id ALS42610; Thu, 22 Nov 2007 10:42:32 GMT Message-Id: <200711221042.ALS42610@c2beaimr04.btconnect.com> From: linux-xfs@oss.sgi.com To: richard@officeequipmentuk.co.uk Subject: Delivery reports about your e-mail Date: Wed, 21 Nov 2040 11:59:42 +0100 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2600.0000 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 X-Mirapoint-Virus: VIRUSDELETED; host=c2beaimr04.btconnect.com; attachment=[2.2]; virus=W32/MyDoom-O X-Junkmail-Status: score=10/50, host=c2beaimr04.btconnect.com X-Junkmail-SD-Raw: score=unknown, refid=str=0001.0A0B0201.47455B26.024B,ss=1,fgs=0, ip=77.242.30.138, so=2006-12-09 10:45:40, dmn=5.4.3/2007-10-18 --ALT91853.1195743071/c2beaimr04.btconnect.com-- From owner-xfs@oss.sgi.com Thu Nov 22 10:10:35 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 22 Nov 2007 10:10:40 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.0-r574664 Received: from waste.org (waste.org [66.93.16.53]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAMIAWWm019774 for ; Thu, 22 Nov 2007 10:10:34 -0800 Received: from waste.org (localhost [127.0.0.1]) by waste.org (8.13.8/8.13.8/Debian-3) with ESMTP id lAMIAVRk031424 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Thu, 22 Nov 2007 12:10:31 -0600 Received: (from oxymoron@localhost) by waste.org (8.13.8/8.13.8/Submit) id lAMIAT3u031416; Thu, 22 Nov 2007 12:10:29 -0600 Date: Thu, 22 Nov 2007 12:10:29 -0600 From: Matt Mackall To: David Chinner Cc: Andi Kleen , xfs-oss , lkml Subject: Re: [PATCH 2/9]: Reduce Log I/O latency Message-ID: <20071122181029.GR19691@waste.org> References: <20071122003339.GH114266761__34694.2978365861$1195691722$gmane$org@sgi.com> <20071122011214.GR114266761@sgi.com> <20071122025726.GG17536@waste.org> <20071122034106.GV114266761@sgi.com> <20071122072549.GQ19691@waste.org> <20071122103159.GW114266761@sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071122103159.GW114266761@sgi.com> User-Agent: Mutt/1.5.13 (2006-08-11) X-Virus-Scanned: ClamAV 0.91.2/4883/Thu Nov 22 01:20:36 2007 on oss.sgi.com X-Virus-Scanned: by amavisd-new X-Virus-Status: Clean X-archive-position: 13747 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: mpm@selenic.com Precedence: bulk X-list: xfs On Thu, Nov 22, 2007 at 09:31:59PM +1100, David Chinner wrote: [...] > > In other words, I/O priority is per-spindle and not per-filesystem and > > thus this change has consequences that leak outside the filesystem in > > question. That's bad. > > This has nothing to do with this patch - it's a problem with sharing > a single resource in a RT system between two non-deterministic > constructs. e.g. I can put two ext3 filesystems on the one spindle, > run two completely independent RT workloads on the different > filesystems and have one workload DOS the other due to differences > in priority at the spindle. Sure. And it's up to the RT system designer not to do something stupid like that. The problem is that your patch potentially promotes a non-RT I/O activity to an RT one without regard to the rest of the system. Stop for a moment and look at all the kernel threads on the system. If your argument was sensible, we'd have raised various of these threads to (CPU) RT ages ago. But if we do, we actually royally foul up things that have tried to carefully isolate themselves from SCHED_NORMAL tasks. If a kernel thread preempted our watchdog or our data collection process because it was trying to service a lower priority task, that would be fatally broken. As kernel engineers, we -do not know- the absolute importance of a given subsystem in the wider scheme of things. Thus we have no business promoting anything outside the "normal" range into RT unless explicitly asked to (eg chrt) or if we're actually doing real deadlock avoidance. > That's not a bug in ext3 or the I/O priority mechanism - that's bad > system design. Put the filesystems on different spindles and the > problem goes away. And so do all your PVR sales. Two spindles is economically impossible on a set-top PVR. And I rather expect this all also applies to having XFS volumes on top of LVM + RAID5 along with other filesystems but I haven't looked closely. > > I'd further add that the kernel internals probably shouldn't wander > > into RT priority levels unless it's actually doing priority > > inheritance, otherwise it's quite likely to upset the careful > > considerations of the RT system designer's priority schemes. > > Even if issuing RT I/O will guarantee problems in a RT system? Absolutely (unless we're actually going to do priority inheritance). The only person who can know the real RT requirements of a system is the system's designer. If he wants to boost the priority of XFS I/O threads into RT, he should be allowed to, but it shouldn't happen automatically. Consider someone concurrently running a database on a filesystem and an RT data collection task direct to a separate partition. RT I/O may currently allow them to successfully > We've put this cool RT I/O prioritisation mechanism in the I/O layer > without any consideration of what it means for the filesystems that > the I/O must pass through first. The design defines I/O > prioritsation from a *process* POV and it ignores the fact that the > filesystem might not work effectively under such prioritisation > mechanism. > > An example, perhaps. > > If you're smart about the way your application does its multi-stream > RT write I/O you preallocate the space and use direct I/O. But even > though you've preallocated the space, in XFS you still need > transactions to work because you have to mark the extent you just > wrote to as written. > > This conversion happens during I/O completion (i.e. in a workqueue) > so it doesn't have the *process* priority to force out log I/O at the same > priority as the RT thread. Hence once all the log buffers are queued > for I/O, the transaction system blocks all the I/O completion workqueues > and all the RT write I/O stops completing and your application, which > is doing synchronous direct I/O into preallocated regions hangs..... Perfectly understood. And that's fine. A system designer is allowed to shoot himself in the foot. > Hence the only way to give the log I/O enough priority to be issued > is to give the I/O a higher priority than anything that is running > at the time. > > This is not a problem the I/O scheduler can solve - it is a result > of the mechanism used to transfer priority from process context to > I/o context. The needs of the filesystem is the key thing that is > missing here - you can't do RT I/O if the filesystem backs up.... I don't think there's any fundamental reason the I/O subsystem or filesystems can't be taught to handle priority inversion, which is much more acceptable and general fix. > > Now consider that you're > > recording and playing back multiple HD streams on low-margin set-top > > hardware and you'll see that making this work -at all- means lots of > > I/O tuning. > > Yes, it does. But along the same lines, sustaining multiple > uncompressed 2k and 4k streams (i.e. multiple GB/s of throughput) > takes a lot of I/O tuning. We had to design a whole new allocator to > tune the I/O patterns to make it work.... ..which makes it fairly attractive to PVR folks until you go mucking with RT behind their backs. If I've got XFS on filesystems A and B on the same spindle (or volume group?) and my real RT I/O takes place only on B, then I want log flushing to happen in RT on B. But -never on A-. If I can do this with a tunable, I'm perfectly happy. -- Mathematics is the supreme nostalgia of our time. From owner-xfs@oss.sgi.com Thu Nov 22 14:29:18 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 22 Nov 2007 14:29:23 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAMMTF9g027104 for ; Thu, 22 Nov 2007 14:29:17 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA10694; Fri, 23 Nov 2007 09:29:16 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAMMTDdD116579251; Fri, 23 Nov 2007 09:29:14 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAMMT9hv116708752; Fri, 23 Nov 2007 09:29:09 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Fri, 23 Nov 2007 09:29:09 +1100 From: David Chinner To: Matt Mackall Cc: David Chinner , Andi Kleen , xfs-oss , lkml Subject: Re: [PATCH 2/9]: Reduce Log I/O latency Message-ID: <20071122222909.GY114266761@sgi.com> References: <20071122003339.GH114266761__34694.2978365861$1195691722$gmane$org@sgi.com> <20071122011214.GR114266761@sgi.com> <20071122025726.GG17536@waste.org> <20071122034106.GV114266761@sgi.com> <20071122072549.GQ19691@waste.org> <20071122103159.GW114266761@sgi.com> <20071122181029.GR19691@waste.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071122181029.GR19691@waste.org> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4883/Thu Nov 22 01:20:36 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13748 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Thu, Nov 22, 2007 at 12:10:29PM -0600, Matt Mackall wrote: > On Thu, Nov 22, 2007 at 09:31:59PM +1100, David Chinner wrote: > [...] > > > In other words, I/O priority is per-spindle and not per-filesystem and > > > thus this change has consequences that leak outside the filesystem in > > > question. That's bad. > > > > This has nothing to do with this patch - it's a problem with sharing > > a single resource in a RT system between two non-deterministic > > constructs. e.g. I can put two ext3 filesystems on the one spindle, > > run two completely independent RT workloads on the different > > filesystems and have one workload DOS the other due to differences > > in priority at the spindle. > > Sure. And it's up to the RT system designer not to do something stupid > like that. The problem is that your patch potentially promotes a > non-RT I/O activity to an RT one without regard to the rest of the > system. So this: http://marc.info/?l=linux-kernel&m=119247074517414&w=2 shouldn't be allowed, either? (rt kjournald for ext3) > Perfectly understood. And that's fine. A system designer is allowed to > shoot himself in the foot. Ok. I'll point anyone that complains at you, Matt ;) > I don't think there's any fundamental reason the I/O subsystem or > filesystems can't be taught to handle priority inversion, which is > much more acceptable and general fix. See my reply to Andi. > If I've got XFS on filesystems A and B on the same spindle (or volume > group?) and my real RT I/O takes place only on B, then I want log > flushing to happen in RT on B. But -never on A-. If I can do this with > a tunable, I'm perfectly happy. No, not another mount option. I'm just going to drop this one for now... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Thu Nov 22 15:09:35 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 22 Nov 2007 15:09:39 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAMN9W2j032441 for ; Thu, 22 Nov 2007 15:09:34 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA11644; Fri, 23 Nov 2007 10:09:28 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAMN9QdD113980672; Fri, 23 Nov 2007 10:09:26 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAMN9Miq116752225; Fri, 23 Nov 2007 10:09:22 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Fri, 23 Nov 2007 10:09:22 +1100 From: David Chinner To: David Chinner Cc: Matt Mackall , Andi Kleen , xfs-oss , lkml Subject: Re: [PATCH 2/9]: Reduce Log I/O latency Message-ID: <20071122230922.GZ114266761@sgi.com> References: <20071122003339.GH114266761__34694.2978365861$1195691722$gmane$org@sgi.com> <20071122011214.GR114266761@sgi.com> <20071122025726.GG17536@waste.org> <20071122034106.GV114266761@sgi.com> <20071122072549.GQ19691@waste.org> <20071122103159.GW114266761@sgi.com> <20071122181029.GR19691@waste.org> <20071122222909.GY114266761@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071122222909.GY114266761@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4884/Thu Nov 22 14:39:38 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13749 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Fri, Nov 23, 2007 at 09:29:09AM +1100, David Chinner wrote: > On Thu, Nov 22, 2007 at 12:10:29PM -0600, Matt Mackall wrote: > > If I've got XFS on filesystems A and B on the same spindle (or volume > > group?) and my real RT I/O takes place only on B, then I want log > > flushing to happen in RT on B. But -never on A-. If I can do this with > > a tunable, I'm perfectly happy. > > No, not another mount option. I'm just going to drop this one for > now... Actually, I might change it to use the highest non-rt priority, which would solve the latency issues in the normal cases and still leave the RT rope dangling for those that want to use it. Is that an acceptible compromise, Matt? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Thu Nov 22 16:21:08 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 22 Nov 2007 16:21:15 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.7 required=5.0 tests=AWL,BAYES_00, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.0-r574664 Received: from waste.org (waste.org [66.93.16.53]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAN0L15g014761 for ; Thu, 22 Nov 2007 16:21:05 -0800 Received: from waste.org (localhost [127.0.0.1]) by waste.org (8.13.8/8.13.8/Debian-3) with ESMTP id lAN0KbiB020064 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Thu, 22 Nov 2007 18:20:38 -0600 Received: (from oxymoron@localhost) by waste.org (8.13.8/8.13.8/Submit) id lAN0KVeE020055; Thu, 22 Nov 2007 18:20:31 -0600 Date: Thu, 22 Nov 2007 18:20:31 -0600 From: Matt Mackall To: David Chinner Cc: Andi Kleen , xfs-oss , lkml Subject: Re: [PATCH 2/9]: Reduce Log I/O latency Message-ID: <20071123002031.GT19691@waste.org> References: <20071122003339.GH114266761__34694.2978365861$1195691722$gmane$org@sgi.com> <20071122011214.GR114266761@sgi.com> <20071122025726.GG17536@waste.org> <20071122034106.GV114266761@sgi.com> <20071122072549.GQ19691@waste.org> <20071122103159.GW114266761@sgi.com> <20071122181029.GR19691@waste.org> <20071122222909.GY114266761@sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071122222909.GY114266761@sgi.com> User-Agent: Mutt/1.5.13 (2006-08-11) X-Virus-Scanned: ClamAV 0.91.2/4884/Thu Nov 22 14:39:38 2007 on oss.sgi.com X-Virus-Scanned: by amavisd-new X-Virus-Status: Clean X-archive-position: 13750 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: mpm@selenic.com Precedence: bulk X-list: xfs On Fri, Nov 23, 2007 at 09:29:09AM +1100, David Chinner wrote: > On Thu, Nov 22, 2007 at 12:10:29PM -0600, Matt Mackall wrote: > > On Thu, Nov 22, 2007 at 09:31:59PM +1100, David Chinner wrote: > > [...] > > > > In other words, I/O priority is per-spindle and not per-filesystem and > > > > thus this change has consequences that leak outside the filesystem in > > > > question. That's bad. > > > > > > This has nothing to do with this patch - it's a problem with sharing > > > a single resource in a RT system between two non-deterministic > > > constructs. e.g. I can put two ext3 filesystems on the one spindle, > > > run two completely independent RT workloads on the different > > > filesystems and have one workload DOS the other due to differences > > > in priority at the spindle. > > > > Sure. And it's up to the RT system designer not to do something stupid > > like that. The problem is that your patch potentially promotes a > > non-RT I/O activity to an RT one without regard to the rest of the > > system. > > So this: > > http://marc.info/?l=linux-kernel&m=119247074517414&w=2 > > shouldn't be allowed, either? (rt kjournald for ext3) No, I think not. If a user wants to manually promote kjournald, that's fine. > > Perfectly understood. And that's fine. A system designer is allowed to > > shoot himself in the foot. > > Ok. I'll point anyone that complains at you, Matt ;) > > > I don't think there's any fundamental reason the I/O subsystem or > > filesystems can't be taught to handle priority inversion, which is > > much more acceptable and general fix. > > See my reply to Andi. I did. And I'll admit it's pretty thorny and I certainly don't know enough about XFS internals to comment further. > > If I've got XFS on filesystems A and B on the same spindle (or volume > > group?) and my real RT I/O takes place only on B, then I want log > > flushing to happen in RT on B. But -never on A-. If I can do this with > > a tunable, I'm perfectly happy. > > No, not another mount option. I'm just going to drop this one for > now... I was actually just suggesting allowing a user to do ioprio_set on the appropriate kernel threads. -- Mathematics is the supreme nostalgia of our time. From owner-xfs@oss.sgi.com Thu Nov 22 16:22:02 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 22 Nov 2007 16:22:08 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=AWL,BAYES_00, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.0-r574664 Received: from waste.org (waste.org [66.93.16.53]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAN0Lxth014959 for ; Thu, 22 Nov 2007 16:22:02 -0800 Received: from waste.org (localhost [127.0.0.1]) by waste.org (8.13.8/8.13.8/Debian-3) with ESMTP id lAN0LjDu020194 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Thu, 22 Nov 2007 18:21:46 -0600 Received: (from oxymoron@localhost) by waste.org (8.13.8/8.13.8/Submit) id lAN0Lf7e020185; Thu, 22 Nov 2007 18:21:42 -0600 Date: Thu, 22 Nov 2007 18:21:41 -0600 From: Matt Mackall To: David Chinner Cc: Andi Kleen , xfs-oss , lkml Subject: Re: [PATCH 2/9]: Reduce Log I/O latency Message-ID: <20071123002141.GU19691@waste.org> References: <20071122003339.GH114266761__34694.2978365861$1195691722$gmane$org@sgi.com> <20071122011214.GR114266761@sgi.com> <20071122025726.GG17536@waste.org> <20071122034106.GV114266761@sgi.com> <20071122072549.GQ19691@waste.org> <20071122103159.GW114266761@sgi.com> <20071122181029.GR19691@waste.org> <20071122222909.GY114266761@sgi.com> <20071122230922.GZ114266761@sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071122230922.GZ114266761@sgi.com> User-Agent: Mutt/1.5.13 (2006-08-11) X-Virus-Scanned: ClamAV 0.91.2/4884/Thu Nov 22 14:39:38 2007 on oss.sgi.com X-Virus-Scanned: by amavisd-new X-Virus-Status: Clean X-archive-position: 13751 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: mpm@selenic.com Precedence: bulk X-list: xfs On Fri, Nov 23, 2007 at 10:09:22AM +1100, David Chinner wrote: > On Fri, Nov 23, 2007 at 09:29:09AM +1100, David Chinner wrote: > > On Thu, Nov 22, 2007 at 12:10:29PM -0600, Matt Mackall wrote: > > > If I've got XFS on filesystems A and B on the same spindle (or volume > > > group?) and my real RT I/O takes place only on B, then I want log > > > flushing to happen in RT on B. But -never on A-. If I can do this with > > > a tunable, I'm perfectly happy. > > > > No, not another mount option. I'm just going to drop this one for > > now... > > Actually, I might change it to use the highest non-rt priority, which > would solve the latency issues in the normal cases and still leave > the RT rope dangling for those that want to use it. > > Is that an acceptible compromise, Matt? Yes, that's perfectly fine. -- Mathematics is the supreme nostalgia of our time. From owner-xfs@oss.sgi.com Thu Nov 22 16:29:22 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 22 Nov 2007 16:29:25 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.4 required=5.0 tests=AWL,BAYES_20,J_CHICKENPOX_62, J_CHICKENPOX_64 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAN0TBxi016379 for ; Thu, 22 Nov 2007 16:29:15 -0800 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA13851; Fri, 23 Nov 2007 11:29:04 +1100 Message-ID: <47461E87.8010607@sgi.com> Date: Fri, 23 Nov 2007 11:27:51 +1100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.9 (X11/20071031) MIME-Version: 1.0 To: David Chinner CC: xfs-oss , xfs-dev Subject: Re: [PATCH 1/2] AIL list threading V2 References: <20071122004643.GP114266761@sgi.com> In-Reply-To: <20071122004643.GP114266761@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4884/Thu Nov 22 14:39:38 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13752 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs Looks good now Dave. David Chinner wrote: > When many hundreds to thousands of threads all try to do simultaneous > transactions and the log is in a tail-pushing situation (i.e. full), > we can get multiple threads walking the AIL list and contending on > the AIL lock. > > Recently wevve had two cases of machines basically locking up because > most of the CPUs in the system are trying to obtain the AIL lock. > The first was an 8p machine with ~2,500 kernel threads trying to > do transactions, and the latest is a 2048p altix closing a file per > MPI rank in a synchronised fashion resulting in > 400 processes > all trying to walk and push the AIL at the same time. > > The AIL push is, in effect, a simple I/O dispatch algorithm complicated > by the ordering constraints placed on it by the transaction subsystem. > It really does not need multiple threads to push on it - even when > only a single CPU is pushing the AIL, it can push the I/O out far faster > that pretty much any disk subsystem can handle. > > So, to avoid contention problems stemming from multiple list walkers, > move the list walk off into another thread and simply provide a "target" > to push to. When a thread requires a push, it sets the target and wakes > the push thread, then goes to sleep waiting for the required amount > of space to become available in the log. > > This mechanism should also be a lot fairer under heavy load as the > waiters will queue in arrival order, rather than queuing in "who completed > a push first" order. > > Also, by moving the pushing to a separate thread we can do more effectively > overload detection and prevention as we can keep context from loop iteration > to loop iteration. That is, we can push only part of the list each loop and not > have to loop back to the start of the list every time we run. This should > also help by reducing the number of items we try to lock and/or push items > that we cannot move. > > Note that this patch is not intended to solve the inefficiencies in the > AIL structure and the associated issues with extremely large list contents. > That needs to be addresses separately; parallel access would cause problems > to any new structure as well, so I'm only aiming to isolate the structure > from unbounded parallelism here. > > Version 2: > > o clean up xfs_trans_push_ail() > o xfs_trans_push_ail() can be done unlocked - the lsn we are returning > is never used so we only need to know if the AIL is not empty before > deciding whether we need to wake up the push thread. > o only check the threshold lsn against the current target once before > waking the aild. > o change checks of mp->m_log to ASSERT()s. Any time this fires it's > indicative of a bug as the aild should only run when there is a log. > o fixed switch indentation in xfsaild_push(). > o initialised restarts variable correctly. > o lengthen idle timeout to 1s. > o return an error from xfs_trans_ail_init() and propagate it to fail > mounting the log. > o add comment to "stuck" checks indicating the source of the magic > numbers. > o pinned items are "stuck". > > Signed-Off-By: Dave Chinner > --- > fs/xfs/linux-2.6/xfs_super.c | 59 +++++++++ > fs/xfs/xfs_log.c | 33 ++++- > fs/xfs/xfs_mount.c | 6 > fs/xfs/xfs_mount.h | 10 + > fs/xfs/xfs_trans.h | 5 > fs/xfs/xfs_trans_ail.c | 269 ++++++++++++++++++++++++++++--------------- > fs/xfs/xfs_trans_priv.h | 8 + > fs/xfs/xfsidbg.c | 12 - > 8 files changed, 288 insertions(+), 114 deletions(-) > > Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_super.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_super.c 2007-11-22 10:33:51.041703837 +1100 > +++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_super.c 2007-11-22 10:34:01.556359712 +1100 > @@ -51,6 +51,7 @@ > #include "xfs_vfsops.h" > #include "xfs_version.h" > #include "xfs_log_priv.h" > +#include "xfs_trans_priv.h" > > #include > #include > @@ -765,6 +766,64 @@ xfs_blkdev_issue_flush( > blkdev_issue_flush(buftarg->bt_bdev, NULL); > } > > +/* > + * XFS AIL push thread support > + */ > +void > +xfsaild_wakeup( > + xfs_mount_t *mp, > + xfs_lsn_t threshold_lsn) > +{ > + mp->m_ail.xa_target = threshold_lsn; > + wake_up_process(mp->m_ail.xa_task); > +} > + > +int > +xfsaild( > + void *data) > +{ > + xfs_mount_t *mp = (xfs_mount_t *)data; > + xfs_lsn_t last_pushed_lsn = 0; > + long tout = 0; > + > + while (!kthread_should_stop()) { > + if (tout) > + schedule_timeout_interruptible(msecs_to_jiffies(tout)); > + tout = 1000; > + > + /* swsusp */ > + try_to_freeze(); > + > + ASSERT(mp->m_log); > + if (XFS_FORCED_SHUTDOWN(mp)) > + continue; > + > + tout = xfsaild_push(mp, &last_pushed_lsn); > + } > + > + return 0; > +} /* xfsaild */ > + > +int > +xfsaild_start( > + xfs_mount_t *mp) > +{ > + mp->m_ail.xa_target = 0; > + mp->m_ail.xa_task = kthread_run(xfsaild, mp, "xfsaild"); > + if (IS_ERR(mp->m_ail.xa_task)) > + return -PTR_ERR(mp->m_ail.xa_task); > + return 0; > +} > + > +void > +xfsaild_stop( > + xfs_mount_t *mp) > +{ > + kthread_stop(mp->m_ail.xa_task); > +} > + > + > + > STATIC struct inode * > xfs_fs_alloc_inode( > struct super_block *sb) > Index: 2.6.x-xfs-new/fs/xfs/xfs_log.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_log.c 2007-11-22 10:33:05.775490010 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfs_log.c 2007-11-22 10:34:01.560359200 +1100 > @@ -498,11 +498,14 @@ xfs_log_reserve(xfs_mount_t *mp, > * Return error or zero. > */ > int > -xfs_log_mount(xfs_mount_t *mp, > - xfs_buftarg_t *log_target, > - xfs_daddr_t blk_offset, > - int num_bblks) > +xfs_log_mount( > + xfs_mount_t *mp, > + xfs_buftarg_t *log_target, > + xfs_daddr_t blk_offset, > + int num_bblks) > { > + int error; > + > if (!(mp->m_flags & XFS_MOUNT_NORECOVERY)) > cmn_err(CE_NOTE, "XFS mounting filesystem %s", mp->m_fsname); > else { > @@ -515,11 +518,21 @@ xfs_log_mount(xfs_mount_t *mp, > mp->m_log = xlog_alloc_log(mp, log_target, blk_offset, num_bblks); > > /* > + * Initialize the AIL now we have a log. > + */ > + spin_lock_init(&mp->m_ail_lock); > + error = xfs_trans_ail_init(mp); > + if (error) { > + cmn_err(CE_WARN, "XFS: AIL initialisation failed: error %d", error); > + goto error; > + } > + > + /* > * skip log recovery on a norecovery mount. pretend it all > * just worked. > */ > if (!(mp->m_flags & XFS_MOUNT_NORECOVERY)) { > - int error, readonly = (mp->m_flags & XFS_MOUNT_RDONLY); > + int readonly = (mp->m_flags & XFS_MOUNT_RDONLY); > > if (readonly) > mp->m_flags &= ~XFS_MOUNT_RDONLY; > @@ -530,8 +543,7 @@ xfs_log_mount(xfs_mount_t *mp, > mp->m_flags |= XFS_MOUNT_RDONLY; > if (error) { > cmn_err(CE_WARN, "XFS: log mount/recovery failed: error %d", error); > - xlog_dealloc_log(mp->m_log); > - return error; > + goto error; > } > } > > @@ -540,6 +552,9 @@ xfs_log_mount(xfs_mount_t *mp, > > /* End mounting message in xfs_log_mount_finish */ > return 0; > +error: > + xfs_log_unmount_dealloc(mp); > + return error; > } /* xfs_log_mount */ > > /* > @@ -722,10 +737,14 @@ xfs_log_unmount_write(xfs_mount_t *mp) > > /* > * Deallocate log structures for unmount/relocation. > + * > + * We need to stop the aild from running before we destroy > + * and deallocate the log as the aild references the log. > */ > void > xfs_log_unmount_dealloc(xfs_mount_t *mp) > { > + xfs_trans_ail_destroy(mp); > xlog_dealloc_log(mp->m_log); > } > > Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.c 2007-11-22 10:33:57.732848488 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfs_mount.c 2007-11-22 10:34:01.560359200 +1100 > @@ -137,15 +137,9 @@ xfs_mount_init(void) > mp->m_flags |= XFS_MOUNT_NO_PERCPU_SB; > } > > - spin_lock_init(&mp->m_ail_lock); > spin_lock_init(&mp->m_sb_lock); > mutex_init(&mp->m_ilock); > mutex_init(&mp->m_growlock); > - /* > - * Initialize the AIL. > - */ > - xfs_trans_ail_init(mp); > - > atomic_set(&mp->m_active_trans, 0); > > return mp; > Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.h > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.h 2007-11-22 10:25:24.974357020 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfs_mount.h 2007-11-22 10:34:01.560359200 +1100 > @@ -219,12 +219,18 @@ extern void xfs_icsb_sync_counters_flags > #define xfs_icsb_sync_counters_flags(mp, flags) do { } while (0) > #endif > > +typedef struct xfs_ail { > + xfs_ail_entry_t xa_ail; > + uint xa_gen; > + struct task_struct *xa_task; > + xfs_lsn_t xa_target; > +} xfs_ail_t; > + > typedef struct xfs_mount { > struct super_block *m_super; > xfs_tid_t m_tid; /* next unused tid for fs */ > spinlock_t m_ail_lock; /* fs AIL mutex */ > - xfs_ail_entry_t m_ail; /* fs active log item list */ > - uint m_ail_gen; /* fs AIL generation count */ > + xfs_ail_t m_ail; /* fs active log item list */ > xfs_sb_t m_sb; /* copy of fs superblock */ > spinlock_t m_sb_lock; /* sb counter lock */ > struct xfs_buf *m_sb_bp; /* buffer for superblock */ > Index: 2.6.x-xfs-new/fs/xfs/xfs_trans.h > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_trans.h 2007-11-22 10:25:24.978356509 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfs_trans.h 2007-11-22 10:34:01.564358689 +1100 > @@ -992,8 +992,9 @@ int _xfs_trans_commit(xfs_trans_t *, > int *); > #define xfs_trans_commit(tp, flags) _xfs_trans_commit(tp, flags, NULL) > void xfs_trans_cancel(xfs_trans_t *, int); > -void xfs_trans_ail_init(struct xfs_mount *); > -xfs_lsn_t xfs_trans_push_ail(struct xfs_mount *, xfs_lsn_t); > +int xfs_trans_ail_init(struct xfs_mount *); > +void xfs_trans_ail_destroy(struct xfs_mount *); > +void xfs_trans_push_ail(struct xfs_mount *, xfs_lsn_t); > xfs_lsn_t xfs_trans_tail_ail(struct xfs_mount *); > void xfs_trans_unlocked_item(struct xfs_mount *, > xfs_log_item_t *); > Index: 2.6.x-xfs-new/fs/xfs/xfs_trans_ail.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_trans_ail.c 2007-11-22 10:25:24.978356509 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfs_trans_ail.c 2007-11-22 10:34:01.564358689 +1100 > @@ -57,7 +57,7 @@ xfs_trans_tail_ail( > xfs_log_item_t *lip; > > spin_lock(&mp->m_ail_lock); > - lip = xfs_ail_min(&(mp->m_ail)); > + lip = xfs_ail_min(&(mp->m_ail.xa_ail)); > if (lip == NULL) { > lsn = (xfs_lsn_t)0; > } else { > @@ -71,119 +71,185 @@ xfs_trans_tail_ail( > /* > * xfs_trans_push_ail > * > - * This routine is called to move the tail of the AIL > - * forward. It does this by trying to flush items in the AIL > - * whose lsns are below the given threshold_lsn. > + * This routine is called to move the tail of the AIL forward. It does this by > + * trying to flush items in the AIL whose lsns are below the given > + * threshold_lsn. > * > - * The routine returns the lsn of the tail of the log. > + * the push is run asynchronously in a separate thread, so we return the tail > + * of the log right now instead of the tail after the push. This means we will > + * either continue right away, or we will sleep waiting on the async thread to > + * do it's work. > + * > + * We do this unlocked - we only need to know whether there is anything in the > + * AIL at the time we are called. We don't need to access the contents of > + * any of the objects, so the lock is not needed. > */ > -xfs_lsn_t > +void > xfs_trans_push_ail( > xfs_mount_t *mp, > xfs_lsn_t threshold_lsn) > { > - xfs_lsn_t lsn; > xfs_log_item_t *lip; > - int gen; > - int restarts; > - int lock_result; > - int flush_log; > > -#define XFS_TRANS_PUSH_AIL_RESTARTS 1000 > + lip = xfs_ail_min(&mp->m_ail.xa_ail); > + if (lip && !XFS_FORCED_SHUTDOWN(mp)) { > + if (XFS_LSN_CMP(threshold_lsn, mp->m_ail.xa_target) > 0) > + xfsaild_wakeup(mp, threshold_lsn); > + } > +} > + > +/* > + * Return the item in the AIL with the current lsn. > + * Return the current tree generation number for use > + * in calls to xfs_trans_next_ail(). > + */ > +STATIC xfs_log_item_t * > +xfs_trans_first_push_ail( > + xfs_mount_t *mp, > + int *gen, > + xfs_lsn_t lsn) > +{ > + xfs_log_item_t *lip; > + > + lip = xfs_ail_min(&(mp->m_ail.xa_ail)); > + *gen = (int)mp->m_ail.xa_gen; > + if (lsn == 0) > + return lip; > + > + while (lip && (XFS_LSN_CMP(lip->li_lsn, lsn) < 0)) > + lip = lip->li_ail.ail_forw; > + > + return lip; > +} > + > +/* > + * Function that does the work of pushing on the AIL > + */ > +long > +xfsaild_push( > + xfs_mount_t *mp, > + xfs_lsn_t *last_lsn) > +{ > + long tout = 1000; /* milliseconds */ > + xfs_lsn_t last_pushed_lsn = *last_lsn; > + xfs_lsn_t target = mp->m_ail.xa_target; > + xfs_lsn_t lsn; > + xfs_log_item_t *lip; > + int gen; > + int restarts; > + int flush_log, count, stuck; > + > +#define XFS_TRANS_PUSH_AIL_RESTARTS 10 > > spin_lock(&mp->m_ail_lock); > - lip = xfs_trans_first_ail(mp, &gen); > - if (lip == NULL || XFS_FORCED_SHUTDOWN(mp)) { > + lip = xfs_trans_first_push_ail(mp, &gen, *last_lsn); > + if (!lip || XFS_FORCED_SHUTDOWN(mp)) { > /* > - * Just return if the AIL is empty. > + * AIL is empty or our push has reached the end. > */ > spin_unlock(&mp->m_ail_lock); > - return (xfs_lsn_t)0; > + last_pushed_lsn = 0; > + goto out; > } > > XFS_STATS_INC(xs_push_ail); > > /* > * While the item we are looking at is below the given threshold > - * try to flush it out. Make sure to limit the number of times > - * we allow xfs_trans_next_ail() to restart scanning from the > - * beginning of the list. We'd like not to stop until we've at least > + * try to flush it out. We'd like not to stop until we've at least > * tried to push on everything in the AIL with an LSN less than > - * the given threshold. However, we may give up before that if > - * we realize that we've been holding the AIL lock for 'too long', > - * blocking interrupts. Currently, too long is < 500us roughly. > + * the given threshold. > + * > + * However, we will stop after a certain number of pushes and wait > + * for a reduced timeout to fire before pushing further. This > + * prevents use from spinning when we can't do anything or there is > + * lots of contention on the AIL lists. > */ > - flush_log = 0; > - restarts = 0; > - while (((restarts < XFS_TRANS_PUSH_AIL_RESTARTS) && > - (XFS_LSN_CMP(lip->li_lsn, threshold_lsn) < 0))) { > + tout = 10; > + lsn = lip->li_lsn; > + flush_log = stuck = count = restarts = 0; > + while ((XFS_LSN_CMP(lip->li_lsn, target) < 0)) { > + int lock_result; > /* > - * If we can lock the item without sleeping, unlock > - * the AIL lock and flush the item. Then re-grab the > - * AIL lock so we can look for the next item on the > - * AIL. Since we unlock the AIL while we flush the > - * item, the next routine may start over again at the > - * the beginning of the list if anything has changed. > - * That is what the generation count is for. > + * If we can lock the item without sleeping, unlock the AIL > + * lock and flush the item. Then re-grab the AIL lock so we > + * can look for the next item on the AIL. List changes are > + * handled by the AIL lookup functions internally > * > - * If we can't lock the item, either its holder will flush > - * it or it is already being flushed or it is being relogged. > - * In any of these case it is being taken care of and we > - * can just skip to the next item in the list. > + * If we can't lock the item, either its holder will flush it > + * or it is already being flushed or it is being relogged. In > + * any of these case it is being taken care of and we can just > + * skip to the next item in the list. > */ > lock_result = IOP_TRYLOCK(lip); > + spin_unlock(&mp->m_ail_lock); > switch (lock_result) { > - case XFS_ITEM_SUCCESS: > - spin_unlock(&mp->m_ail_lock); > + case XFS_ITEM_SUCCESS: > XFS_STATS_INC(xs_push_ail_success); > IOP_PUSH(lip); > - spin_lock(&mp->m_ail_lock); > + last_pushed_lsn = lsn; > break; > > - case XFS_ITEM_PUSHBUF: > - spin_unlock(&mp->m_ail_lock); > + case XFS_ITEM_PUSHBUF: > XFS_STATS_INC(xs_push_ail_pushbuf); > -#ifdef XFSRACEDEBUG > - delay_for_intr(); > - delay(300); > -#endif > - ASSERT(lip->li_ops->iop_pushbuf); > - ASSERT(lip); > IOP_PUSHBUF(lip); > - spin_lock(&mp->m_ail_lock); > + last_pushed_lsn = lsn; > break; > > - case XFS_ITEM_PINNED: > + case XFS_ITEM_PINNED: > XFS_STATS_INC(xs_push_ail_pinned); > + stuck++; > flush_log = 1; > break; > > - case XFS_ITEM_LOCKED: > + case XFS_ITEM_LOCKED: > XFS_STATS_INC(xs_push_ail_locked); > + last_pushed_lsn = lsn; > + stuck++; > break; > > - case XFS_ITEM_FLUSHING: > + case XFS_ITEM_FLUSHING: > XFS_STATS_INC(xs_push_ail_flushing); > + last_pushed_lsn = lsn; > + stuck++; > break; > > - default: > + default: > ASSERT(0); > break; > } > > - lip = xfs_trans_next_ail(mp, lip, &gen, &restarts); > - if (lip == NULL) { > + spin_lock(&mp->m_ail_lock); > + /* should we bother continuing? */ > + if (XFS_FORCED_SHUTDOWN(mp)) > break; > - } > - if (XFS_FORCED_SHUTDOWN(mp)) { > - /* > - * Just return if we shut down during the last try. > - */ > - spin_unlock(&mp->m_ail_lock); > - return (xfs_lsn_t)0; > - } > + ASSERT(mp->m_log); > + > + count++; > > + /* > + * Are there too many items we can't do anything with? > + * If we we are skipping too many items because we can't flush > + * them or they are already being flushed, we back off and > + * given them time to complete whatever operation is being > + * done. i.e. remove pressure from the AIL while we can't make > + * progress so traversals don't slow down further inserts and > + * removals to/from the AIL. > + * > + * The value of 100 is an arbitrary magic number based on > + * observation. > + */ > + if (stuck > 100) > + break; > + > + lip = xfs_trans_next_ail(mp, lip, &gen, &restarts); > + if (lip == NULL) > + break; > + if (restarts > XFS_TRANS_PUSH_AIL_RESTARTS) > + break; > + lsn = lip->li_lsn; > } > + spin_unlock(&mp->m_ail_lock); > > if (flush_log) { > /* > @@ -191,22 +257,35 @@ xfs_trans_push_ail( > * push out the log so it will become unpinned and > * move forward in the AIL. > */ > - spin_unlock(&mp->m_ail_lock); > XFS_STATS_INC(xs_push_ail_flush); > xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE); > - spin_lock(&mp->m_ail_lock); > } > > - lip = xfs_ail_min(&(mp->m_ail)); > - if (lip == NULL) { > - lsn = (xfs_lsn_t)0; > - } else { > - lsn = lip->li_lsn; > + /* > + * We reached the target so wait a bit longer for I/O to complete and > + * remove pushed items from the AIL before we start the next scan from > + * the start of the AIL. > + */ > + if ((XFS_LSN_CMP(lsn, target) >= 0)) { > + tout += 20; > + last_pushed_lsn = 0; > + } else if ((restarts > XFS_TRANS_PUSH_AIL_RESTARTS) || > + (count && ((stuck * 100) / count > 90))) { > + /* > + * Either there is a lot of contention on the AIL or we > + * are stuck due to operations in progress. "Stuck" in this > + * case is defined as >90% of the items we tried to push > + * were stuck. > + * > + * Backoff a bit more to allow some I/O to complete before > + * continuing from where we were. > + */ > + tout += 10; > } > - > - spin_unlock(&mp->m_ail_lock); > - return lsn; > -} /* xfs_trans_push_ail */ > +out: > + *last_lsn = last_pushed_lsn; > + return tout; > +} /* xfsaild_push */ > > > /* > @@ -247,7 +326,7 @@ xfs_trans_unlocked_item( > * the call to xfs_log_move_tail() doesn't do anything if there's > * not enough free space to wake people up so we're safe calling it. > */ > - min_lip = xfs_ail_min(&mp->m_ail); > + min_lip = xfs_ail_min(&mp->m_ail.xa_ail); > > if (min_lip == lip) > xfs_log_move_tail(mp, 1); > @@ -279,7 +358,7 @@ xfs_trans_update_ail( > xfs_log_item_t *dlip=NULL; > xfs_log_item_t *mlip; /* ptr to minimum lip */ > > - ailp = &(mp->m_ail); > + ailp = &(mp->m_ail.xa_ail); > mlip = xfs_ail_min(ailp); > > if (lip->li_flags & XFS_LI_IN_AIL) { > @@ -292,10 +371,10 @@ xfs_trans_update_ail( > lip->li_lsn = lsn; > > xfs_ail_insert(ailp, lip); > - mp->m_ail_gen++; > + mp->m_ail.xa_gen++; > > if (mlip == dlip) { > - mlip = xfs_ail_min(&(mp->m_ail)); > + mlip = xfs_ail_min(&(mp->m_ail.xa_ail)); > spin_unlock(&mp->m_ail_lock); > xfs_log_move_tail(mp, mlip->li_lsn); > } else { > @@ -330,7 +409,7 @@ xfs_trans_delete_ail( > xfs_log_item_t *mlip; > > if (lip->li_flags & XFS_LI_IN_AIL) { > - ailp = &(mp->m_ail); > + ailp = &(mp->m_ail.xa_ail); > mlip = xfs_ail_min(ailp); > dlip = xfs_ail_delete(ailp, lip); > ASSERT(dlip == lip); > @@ -338,10 +417,10 @@ xfs_trans_delete_ail( > > lip->li_flags &= ~XFS_LI_IN_AIL; > lip->li_lsn = 0; > - mp->m_ail_gen++; > + mp->m_ail.xa_gen++; > > if (mlip == dlip) { > - mlip = xfs_ail_min(&(mp->m_ail)); > + mlip = xfs_ail_min(&(mp->m_ail.xa_ail)); > spin_unlock(&mp->m_ail_lock); > xfs_log_move_tail(mp, (mlip ? mlip->li_lsn : 0)); > } else { > @@ -379,10 +458,10 @@ xfs_trans_first_ail( > { > xfs_log_item_t *lip; > > - lip = xfs_ail_min(&(mp->m_ail)); > - *gen = (int)mp->m_ail_gen; > + lip = xfs_ail_min(&(mp->m_ail.xa_ail)); > + *gen = (int)mp->m_ail.xa_gen; > > - return (lip); > + return lip; > } > > /* > @@ -402,11 +481,11 @@ xfs_trans_next_ail( > xfs_log_item_t *nlip; > > ASSERT(mp && lip && gen); > - if (mp->m_ail_gen == *gen) { > - nlip = xfs_ail_next(&(mp->m_ail), lip); > + if (mp->m_ail.xa_gen == *gen) { > + nlip = xfs_ail_next(&(mp->m_ail.xa_ail), lip); > } else { > - nlip = xfs_ail_min(&(mp->m_ail)); > - *gen = (int)mp->m_ail_gen; > + nlip = xfs_ail_min(&(mp->m_ail).xa_ail); > + *gen = (int)mp->m_ail.xa_gen; > if (restarts != NULL) { > XFS_STATS_INC(xs_push_ail_restarts); > (*restarts)++; > @@ -431,12 +510,20 @@ xfs_trans_next_ail( > /* > * Initialize the doubly linked list to point only to itself. > */ > -void > +int > xfs_trans_ail_init( > xfs_mount_t *mp) > { > - mp->m_ail.ail_forw = (xfs_log_item_t*)&(mp->m_ail); > - mp->m_ail.ail_back = (xfs_log_item_t*)&(mp->m_ail); > + mp->m_ail.xa_ail.ail_forw = (xfs_log_item_t*)&mp->m_ail.xa_ail; > + mp->m_ail.xa_ail.ail_back = (xfs_log_item_t*)&mp->m_ail.xa_ail; > + return xfsaild_start(mp); > +} > + > +void > +xfs_trans_ail_destroy( > + xfs_mount_t *mp) > +{ > + xfsaild_stop(mp); > } > > /* > Index: 2.6.x-xfs-new/fs/xfs/xfs_trans_priv.h > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_trans_priv.h 2007-11-22 10:25:24.982355999 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfs_trans_priv.h 2007-11-22 10:34:01.568358178 +1100 > @@ -57,4 +57,12 @@ struct xfs_log_item *xfs_trans_next_ail( > struct xfs_log_item *, int *, int *); > > > +/* > + * AIL push thread support > + */ > +long xfsaild_push(struct xfs_mount *, xfs_lsn_t *); > +void xfsaild_wakeup(struct xfs_mount *, xfs_lsn_t); > +int xfsaild_start(struct xfs_mount *); > +void xfsaild_stop(struct xfs_mount *); > + > #endif /* __XFS_TRANS_PRIV_H__ */ > Index: 2.6.x-xfs-new/fs/xfs/xfsidbg.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfsidbg.c 2007-11-22 10:33:54.001325501 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfsidbg.c 2007-11-22 10:34:01.572357667 +1100 > @@ -6220,13 +6220,13 @@ xfsidbg_xaildump(xfs_mount_t *mp) > }; > int count; > > - if ((mp->m_ail.ail_forw == NULL) || > - (mp->m_ail.ail_forw == (xfs_log_item_t *)&mp->m_ail)) { > + if ((mp->m_ail.xa_ail.ail_forw == NULL) || > + (mp->m_ail.xa_ail.ail_forw == (xfs_log_item_t *)&mp->m_ail.xa_ail)) { > kdb_printf("AIL is empty\n"); > return; > } > kdb_printf("AIL for mp 0x%p, oldest first\n", mp); > - lip = (xfs_log_item_t*)mp->m_ail.ail_forw; > + lip = (xfs_log_item_t*)mp->m_ail.xa_ail.ail_forw; > for (count = 0; lip; count++) { > kdb_printf("[%d] type %s ", count, xfsidbg_item_type_str(lip)); > printflags((uint)(lip->li_flags), li_flags, "flags:"); > @@ -6255,7 +6255,7 @@ xfsidbg_xaildump(xfs_mount_t *mp) > break; > } > > - if (lip->li_ail.ail_forw == (xfs_log_item_t*)&mp->m_ail) { > + if (lip->li_ail.ail_forw == (xfs_log_item_t*)&mp->m_ail.xa_ail) { > lip = NULL; > } else { > lip = lip->li_ail.ail_forw; > @@ -6312,9 +6312,9 @@ xfsidbg_xmount(xfs_mount_t *mp) > > kdb_printf("xfs_mount at 0x%p\n", mp); > kdb_printf("tid 0x%x ail_lock 0x%p &ail 0x%p\n", > - mp->m_tid, &mp->m_ail_lock, &mp->m_ail); > + mp->m_tid, &mp->m_ail_lock, &mp->m_ail.xa_ail); > kdb_printf("ail_gen 0x%x &sb 0x%p\n", > - mp->m_ail_gen, &mp->m_sb); > + mp->m_ail.xa_gen, &mp->m_sb); > kdb_printf("sb_lock 0x%p sb_bp 0x%p dev 0x%x logdev 0x%x rtdev 0x%x\n", > &mp->m_sb_lock, mp->m_sb_bp, > mp->m_ddev_targp ? mp->m_ddev_targp->bt_dev : 0, > From owner-xfs@oss.sgi.com Thu Nov 22 16:44:28 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 22 Nov 2007 16:44:30 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_63 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAN0iOIO018577 for ; Thu, 22 Nov 2007 16:44:26 -0800 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA14305; Fri, 23 Nov 2007 11:44:27 +1100 Message-ID: <47462222.9060501@sgi.com> Date: Fri, 23 Nov 2007 11:43:14 +1100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.9 (X11/20071031) MIME-Version: 1.0 To: David Chinner CC: xfs-oss , xfs-dev Subject: Re: [PATCH 2/2] Debug - don't exhaustively check the AIL on every operation References: <20071122005003.GQ114266761@sgi.com> In-Reply-To: <20071122005003.GQ114266761@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4884/Thu Nov 22 14:39:38 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13753 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs Looks good Dave. There's lots of debug code bound by XFS_TRANS_DEBUG - should we be enabling this in our QA? David Chinner wrote: > Checking the entire AIL on every insert and remove is > prohibitively expensive - the sustained sequntial create rate > on a single disk drops from about 1800/s to 60/s because of > this checking resulting in the xfslogd becoming cpu bound. > > By default on debug builds, only check the next and previous > entries in the list to ensure they are ordered correctly. > If you really want, define XFS_TRANS_DEBUG to use the old > behaviour. > > Signed-off-by: Dave Chinner > --- > fs/xfs/xfs_trans_ail.c | 37 ++++++++++++++++++++++++++++--------- > 1 file changed, 28 insertions(+), 9 deletions(-) > > Index: 2.6.x-xfs-new/fs/xfs/xfs_trans_ail.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_trans_ail.c 2007-11-22 10:34:01.564358689 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfs_trans_ail.c 2007-11-22 10:34:03.320134239 +1100 > @@ -34,9 +34,9 @@ STATIC xfs_log_item_t * xfs_ail_min(xfs_ > STATIC xfs_log_item_t * xfs_ail_next(xfs_ail_entry_t *, xfs_log_item_t *); > > #ifdef DEBUG > -STATIC void xfs_ail_check(xfs_ail_entry_t *); > +STATIC void xfs_ail_check(xfs_ail_entry_t *, xfs_log_item_t *); > #else > -#define xfs_ail_check(a) > +#define xfs_ail_check(a,l) > #endif /* DEBUG */ > > > @@ -563,7 +563,7 @@ xfs_ail_insert( > next_lip->li_ail.ail_forw = lip; > lip->li_ail.ail_forw->li_ail.ail_back = lip; > > - xfs_ail_check(base); > + xfs_ail_check(base, lip); > return; > } > > @@ -577,12 +577,12 @@ xfs_ail_delete( > xfs_log_item_t *lip) > /* ARGSUSED */ > { > + xfs_ail_check(base, lip); > lip->li_ail.ail_forw->li_ail.ail_back = lip->li_ail.ail_back; > lip->li_ail.ail_back->li_ail.ail_forw = lip->li_ail.ail_forw; > lip->li_ail.ail_forw = NULL; > lip->li_ail.ail_back = NULL; > > - xfs_ail_check(base); > return lip; > } > > @@ -626,13 +626,13 @@ xfs_ail_next( > */ > STATIC void > xfs_ail_check( > - xfs_ail_entry_t *base) > + xfs_ail_entry_t *base, > + xfs_log_item_t *lip) > { > - xfs_log_item_t *lip; > xfs_log_item_t *prev_lip; > > - lip = base->ail_forw; > - if (lip == (xfs_log_item_t*)base) { > + prev_lip = base->ail_forw; > + if (prev_lip == (xfs_log_item_t*)base) { > /* > * Make sure the pointers are correct when the list > * is empty. > @@ -642,9 +642,27 @@ xfs_ail_check( > } > > /* > + * Check the next and previous entries are valid. > + */ > + ASSERT((lip->li_flags & XFS_LI_IN_AIL) != 0); > + prev_lip = lip->li_ail.ail_back; > + if (prev_lip != (xfs_log_item_t*)base) { > + ASSERT(prev_lip->li_ail.ail_forw == lip); > + ASSERT(XFS_LSN_CMP(prev_lip->li_lsn, lip->li_lsn) <= 0); > + } > + prev_lip = lip->li_ail.ail_forw; > + if (prev_lip != (xfs_log_item_t*)base) { > + ASSERT(prev_lip->li_ail.ail_back == lip); > + ASSERT(XFS_LSN_CMP(prev_lip->li_lsn, lip->li_lsn) >= 0); > + } > + > + > +#ifdef XFS_TRANS_DEBUG > + /* > * Walk the list checking forward and backward pointers, > * lsn ordering, and that every entry has the XFS_LI_IN_AIL > - * flag set. > + * flag set. This is really expensive, so only do it when > + * specifically debugging the transaction subsystem. > */ > prev_lip = (xfs_log_item_t*)base; > while (lip != (xfs_log_item_t*)base) { > @@ -659,5 +677,6 @@ xfs_ail_check( > } > ASSERT(lip == (xfs_log_item_t*)base); > ASSERT(base->ail_back == prev_lip); > +#endif /* XFS_TRANS_DEBUG */ > } > #endif /* DEBUG */ > From owner-xfs@oss.sgi.com Thu Nov 22 17:24:34 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 22 Nov 2007 17:24:36 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAN1OTmJ024010 for ; Thu, 22 Nov 2007 17:24:32 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA15427; Fri, 23 Nov 2007 12:24:28 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAN1ORdD116845849; Fri, 23 Nov 2007 12:24:28 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAN1OPRT116867768; Fri, 23 Nov 2007 12:24:25 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Fri, 23 Nov 2007 12:24:25 +1100 From: David Chinner To: Lachlan McIlroy Cc: David Chinner , xfs-oss , xfs-dev Subject: Re: [PATCH 2/2] Debug - don't exhaustively check the AIL on every operation Message-ID: <20071123012425.GA114266761@sgi.com> References: <20071122005003.GQ114266761@sgi.com> <47462222.9060501@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <47462222.9060501@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4885/Thu Nov 22 14:56:18 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13754 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Fri, Nov 23, 2007 at 11:43:14AM +1100, Lachlan McIlroy wrote: > Looks good Dave. > > There's lots of debug code bound by XFS_TRANS_DEBUG - should we be > enabling this in our QA? No, they are more for validation when you are hacking on the transaction code. The current debug code should detect most problems runtime problems, but if you change the way anything in the logging works you'll be wanting to test your changes with that set (e.g. when we do the transaction rollback stuff). Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Thu Nov 22 18:53:11 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 22 Nov 2007 18:53:15 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from one.firstfloor.org (one.firstfloor.org [213.235.205.2]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAN2r9tT008773 for ; Thu, 22 Nov 2007 18:53:11 -0800 Received: by one.firstfloor.org (Postfix, from userid 503) id 6467318902A8; Fri, 23 Nov 2007 03:53:17 +0100 (CET) Date: Fri, 23 Nov 2007 03:53:17 +0100 From: Andi Kleen To: David Chinner Cc: Andi Kleen , Stewart Smith , xfs-oss , lkml Subject: Re: [PATCH 2/9]: Reduce Log I/O latency Message-ID: <20071123025317.GA12257@one.firstfloor.org> References: <20071122003339.GH114266761__34694.2978365861$1195691722$gmane$org@sgi.com> <20071122011214.GR114266761@sgi.com> <1195702123.8369.78.camel@localhost.localdomain> <20071122120611.GA3573@one.firstfloor.org> <20071122131539.GX114266761@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071122131539.GX114266761@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4885/Thu Nov 22 14:56:18 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13755 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: andi@firstfloor.org Precedence: bulk X-list: xfs On Fri, Nov 23, 2007 at 12:15:39AM +1100, David Chinner wrote: > On Thu, Nov 22, 2007 at 01:06:11PM +0100, Andi Kleen wrote: > > > FWIW from a "real time" database POV this seems to make sense to me... > > > in fact, we probably rely on filesystem metadata way too much > > > (historically it's just "worked".... although we do seem to get issues > > > on ext3). > > > > For that case you really would need priority inheritance: any metadata > > IO on behalf or blocking a process needs to use the process' block IO > > priority. > > How do you do that when the processes are blocking on semaphores, > mutexes or rw-semaphores in the fileysystem three layers removed from > the I/O in progress? [...] I didn't say it was easy (or rather explicitely said it would be tricky). Probably it would be possible to fold it somehow into rt mutexes PI, but it's not easy and semaphores would need to be handled too. Just my point was to solve the metadata RT problem unconditionally increasing the priority is a bad idea and not really a replacement to a "full" solution. Short term a user can just increase the priority of all the XFS threads anyways. -Andi From owner-xfs@oss.sgi.com Thu Nov 22 19:13:04 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 22 Nov 2007 19:13:08 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAN3Cv5O011727 for ; Thu, 22 Nov 2007 19:13:03 -0800 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA17835 for ; Fri, 23 Nov 2007 14:13:05 +1100 Received: by chook.melbourne.sgi.com (Postfix, from userid 44625) id 4E3D758C4C0A; Fri, 23 Nov 2007 14:13:05 +1100 (EST) To: xfs@oss.sgi.com Subject: TAKE 972554 - Clear XBF_READ_AHEAD flag on I/O completion. Message-Id: <20071123031305.4E3D758C4C0A@chook.melbourne.sgi.com> Date: Fri, 23 Nov 2007 14:13:05 +1100 (EST) From: lachlan@sgi.com (Lachlan McIlroy) X-Virus-Scanned: ClamAV 0.91.2/4885/Thu Nov 22 14:56:18 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13756 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs Clear XBF_READ_AHEAD flag on I/O completion. Date: Fri Nov 23 14:08:40 AEDT 2007 Workarea: redback.melbourne.sgi.com:/home/lachlan/isms/2.6.x-bufflag Inspected by: hch Author: lachlan The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb Modid: xfs-linux-melb:xfs-kern:30128a fs/xfs/linux-2.6/xfs_buf.c - 1.248 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_buf.c.diff?r1=text&tr1=1.248&r2=text&tr2=1.247&f=h - Clear XBF_READ_AHEAD flag on I/O completion. From owner-xfs@oss.sgi.com Thu Nov 22 20:03:47 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 22 Nov 2007 20:03:52 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAN43hI4024729 for ; Thu, 22 Nov 2007 20:03:46 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA18829; Fri, 23 Nov 2007 15:03:38 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAN43ZdD116690765; Fri, 23 Nov 2007 15:03:36 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAN43Uif116834921; Fri, 23 Nov 2007 15:03:30 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Fri, 23 Nov 2007 15:03:29 +1100 From: David Chinner To: Andi Kleen Cc: David Chinner , Stewart Smith , xfs-oss , lkml Subject: Re: [PATCH 2/9]: Reduce Log I/O latency Message-ID: <20071123040329.GB114266761@sgi.com> References: <20071122003339.GH114266761__34694.2978365861$1195691722$gmane$org@sgi.com> <20071122011214.GR114266761@sgi.com> <1195702123.8369.78.camel@localhost.localdomain> <20071122120611.GA3573@one.firstfloor.org> <20071122131539.GX114266761@sgi.com> <20071123025317.GA12257@one.firstfloor.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071123025317.GA12257@one.firstfloor.org> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4885/Thu Nov 22 14:56:18 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13757 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Fri, Nov 23, 2007 at 03:53:17AM +0100, Andi Kleen wrote: > On Fri, Nov 23, 2007 at 12:15:39AM +1100, David Chinner wrote: > > On Thu, Nov 22, 2007 at 01:06:11PM +0100, Andi Kleen wrote: > > > > FWIW from a "real time" database POV this seems to make sense to me... > > > > in fact, we probably rely on filesystem metadata way too much > > > > (historically it's just "worked".... although we do seem to get issues > > > > on ext3). > > > > > > For that case you really would need priority inheritance: any metadata > > > IO on behalf or blocking a process needs to use the process' block IO > > > priority. > > > > How do you do that when the processes are blocking on semaphores, > > mutexes or rw-semaphores in the fileysystem three layers removed from > > the I/O in progress? > > [...] I didn't say it was easy (or rather explicitely said it would be tricky). > Probably it would be possible to fold it somehow into rt mutexes PI, > but it's not easy and semaphores would need to be handled too. > > Just my point was to solve the metadata RT problem unconditionally increasing > the priority is a bad idea and not really a replacement to a "full" > solution. Short term a user can just increase the priority of all the XFS > threads anyways. The point is that it's not actually a thread-based problem - the priority can't be inherited via the traditional mutex-like manner. There is no connection between a thread and an I/o it has already issued and so you can't transfer a priority from a blocked thread to an issued-but-blocked i/o.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Thu Nov 22 20:30:20 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 22 Nov 2007 20:30:24 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00, T_STOX_BOUND_090909_B autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAN4UF9W001101 for ; Thu, 22 Nov 2007 20:30:18 -0800 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA19415; Fri, 23 Nov 2007 15:30:19 +1100 Message-ID: <47465712.1050000@sgi.com> Date: Fri, 23 Nov 2007 15:29:06 +1100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.9 (X11/20071031) MIME-Version: 1.0 To: xfs-dev , xfs-oss Subject: [PATCH] Fix up xfs_buf_associate_memory() Content-Type: multipart/mixed; boundary="------------030003090509040502010601" X-Virus-Scanned: ClamAV 0.91.2/4885/Thu Nov 22 14:56:18 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13758 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs This is a multi-part message in MIME format. --------------030003090509040502010601 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Fixed a few bugs in xfs_buf_associate_memory() including: - calculation of 'page_count' was incorrect as it did not consider the offset of 'mem' into the first page. The logic to bump 'page_count' didn't work if 'len' was <= PAGE_CACHE_SIZE (ie offset = 3k, len = 2k). - setting b_buffer_length to 'len' is incorrect if 'offset' is > 0. Set it to the total length of the buffer. - I suspect that passing a non-aligned address into mem_to_page() for the first page may have been causing issues - don't know but just tidy up that code anyway. These fixes prevent an data corruption issue that can occur during log replay. Lachlan --------------030003090509040502010601 Content-Type: text/x-patch; name="xfs_buf.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="xfs_buf.diff" --- fs/xfs/linux-2.6/xfs_buf.c_1.247 2007-11-23 12:03:16.000000000 +1100 +++ fs/xfs/linux-2.6/xfs_buf.c 2007-11-23 12:02:32.000000000 +1100 @@ -726,14 +726,14 @@ xfs_buf_associate_memory( int rval; int i = 0; size_t ptr; - size_t end, end_cur; + size_t buflen; off_t offset; int page_count; - page_count = PAGE_CACHE_ALIGN(len) >> PAGE_CACHE_SHIFT; - offset = (off_t) mem - ((off_t)mem & PAGE_CACHE_MASK); - if (offset && (len > PAGE_CACHE_SIZE)) - page_count++; + ptr = (size_t) mem & PAGE_CACHE_MASK; + offset = (off_t) mem - (off_t) ptr; + buflen = PAGE_CACHE_ALIGN(len + offset); + page_count = buflen >> PAGE_CACHE_SHIFT; /* Free any previous set of page pointers */ if (bp->b_pages) @@ -747,22 +747,15 @@ xfs_buf_associate_memory( return rval; bp->b_offset = offset; - ptr = (size_t) mem & PAGE_CACHE_MASK; - end = PAGE_CACHE_ALIGN((size_t) mem + len); - end_cur = end; - /* set up first page */ - bp->b_pages[0] = mem_to_page(mem); - - ptr += PAGE_CACHE_SIZE; - bp->b_page_count = ++i; - while (ptr < end) { - bp->b_pages[i] = mem_to_page((void *)ptr); - bp->b_page_count = ++i; + + while (i < bp->b_page_count) { + bp->b_pages[i++] = mem_to_page((void *)ptr); ptr += PAGE_CACHE_SIZE; } bp->b_locked = 0; - bp->b_count_desired = bp->b_buffer_length = len; + bp->b_count_desired = len; + bp->b_buffer_length = buflen; bp->b_flags |= XBF_MAPPED; return 0; --------------030003090509040502010601-- From owner-xfs@oss.sgi.com Thu Nov 22 23:05:54 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 22 Nov 2007 23:05:59 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00, T_STOX_BOUND_090909_B autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAN75mdj018862 for ; Thu, 22 Nov 2007 23:05:52 -0800 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id SAA22471; Fri, 23 Nov 2007 18:05:51 +1100 Message-ID: <47467B87.2000000@sgi.com> Date: Fri, 23 Nov 2007 18:04:39 +1100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.9 (X11/20071031) MIME-Version: 1.0 To: xfs-dev , xfs-oss Subject: [PATCH, RFC] Delayed logging of file sizes Content-Type: multipart/mixed; boundary="------------070405030101040701060405" X-Virus-Scanned: ClamAV 0.91.2/4885/Thu Nov 22 14:56:18 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13759 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs This is a multi-part message in MIME format. --------------070405030101040701060405 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Here's a patch for an idea to address an issue we have with log replay. The problem stems from the fact that not all changes to inodes result in transactions. Some changes (such as file size updates, timestamp updates) would generate too much log traffic if we logged a transaction for every event. So we update the inode and flag it as dirty (ie set i_update_core/i_update_size). If the inode gets logged in a later transaction then the update gets rolled into that transaction and the flag is cleared. If the inode gets flushed to disk before another transaction then the on-disk version of the inode is newer than what is in the log. On log replay we risk overwriting a newer inode on disk with an older version of the inode from the log. We try to prevent this with the i_flushiter counter in the inode (the on-disk inode will have a greater flushiter than the inode in the log) but this does not work for newly created inode cluster buffers. When log replay encounters a newly allocated inode cluster buffer in the log it cannot determine if the on-disk version of the cluster is older, newer or even valid. Since we cannot determine if the on-disk inode cluster has been initialised since it was logged we cannot read it to check the i_flushiter values. If we write out the log record anyway we risk overwriting newer inode data, if we don't write out the log record we risk exposing an uninitialised inode cluster. The easy solution is to log everything so that log replay doesn't need to check if the on-disk version is newer - it can just replay the log. But logging everything would cause too much log traffic so this patch is a compromise and it logs a transaction before we flush an inode to disk only if it has changes that have not yet been logged. With this scheme we don't clear the i_update_core flag when flushing an inode - we only do that when logging a transaction. The flag is used to determine if a transaction needs to be done. It needs to be this way because the log pushing code may need to flush an inode and we cannot create a transaction at that point. Anyway it's an idea and needs a little polishing so comments are welcome. Lachlan --------------070405030101040701060405 Content-Type: text/x-patch; name="logsize.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="logsize.diff" --- fs/xfs/xfs_inode.c_1.487 2007-11-23 14:18:58.000000000 +1100 +++ fs/xfs/xfs_inode.c 2007-11-23 14:20:55.000000000 +1100 @@ -3331,21 +3331,6 @@ xfs_iflush_int( dip = (xfs_dinode_t *)xfs_buf_offset(bp, ip->i_boffset); /* - * Clear i_update_core before copying out the data. - * This is for coordination with our timestamp updates - * that don't hold the inode lock. They will always - * update the timestamps BEFORE setting i_update_core, - * so if we clear i_update_core after they set it we - * are guaranteed to see their updates to the timestamps. - * I believe that this depends on strongly ordered memory - * semantics, but we have that. We use the SYNCHRONIZE - * macro to make sure that the compiler does not reorder - * the i_update_core access below the data copy below. - */ - ip->i_update_core = 0; - SYNCHRONIZE(); - - /* * Make sure to get the latest atime from the Linux inode. */ xfs_synchronize_atime(ip); --- fs/xfs/xfs_vfsops.c_1.548 2007-11-23 14:26:52.000000000 +1100 +++ fs/xfs/xfs_vfsops.c 2007-11-23 14:18:03.000000000 +1100 @@ -1162,6 +1162,12 @@ xfs_sync_inodes( if (mount_locked) IPOINTER_INSERT(ip, mp); + if (ip->i_update_core) { + xfs_iunlock(ip, XFS_ILOCK_SHARED); + error = xfs_log_inode(ip, flags & SYNC_WAIT); + xfs_ilock(ip, XFS_ILOCK_SHARED); + } + if (flags & SYNC_WAIT) { xfs_iflock(ip); error = xfs_iflush(ip, XFS_IFLUSH_SYNC); --- fs/xfs/xfs_vnodeops.c_1.726 2007-11-23 14:26:53.000000000 +1100 +++ fs/xfs/xfs_vnodeops.c 2007-11-23 14:18:03.000000000 +1100 @@ -3527,6 +3527,39 @@ xfs_rwunlock( int +xfs_log_inode( + xfs_inode_t *ip, + int sync) +{ + xfs_trans_t *tp; + int error; + + xfs_itrace_entry(ip); + + if (ip->i_update_core == 0) + return 0; + + tp = xfs_trans_alloc(ip->i_mount, XFS_TRANS_FSYNC_TS); + if ((error = xfs_trans_reserve(tp, 0, + XFS_FSYNC_TS_LOG_RES(ip->i_mount), 0, 0, 0))) { + xfs_trans_cancel(tp, 0); + return error; + } + xfs_ilock(ip, XFS_ILOCK_EXCL); + xfs_synchronize_atime(ip); + xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); + xfs_trans_ihold(tp, ip); + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); + if (sync) + xfs_trans_set_sync(tp); + error = xfs_trans_commit(tp, 0); + xfs_iunlock(ip, XFS_ILOCK_EXCL); + + return error; +} + + +int xfs_inode_flush( xfs_inode_t *ip, int flags) @@ -3579,6 +3612,10 @@ xfs_inode_flush( if (flags & FLUSH_INODE) { int flush_flags; + error = xfs_log_inode(ip, flags & FLUSH_SYNC); + if (error) + return error; + if (flags & FLUSH_SYNC) { xfs_ilock(ip, XFS_ILOCK_SHARED); xfs_iflock(ip); @@ -3751,6 +3788,15 @@ xfs_finish_reclaim( if (ip->i_update_core || ((ip->i_itemp != NULL) && (ip->i_itemp->ili_format.ilf_fields != 0))) { + + if (ip->i_update_core) { + xfs_ifunlock(ip); + xfs_iunlock(ip, XFS_ILOCK_EXCL); + error = xfs_log_inode(ip, sync_mode & XFS_IFLUSH_SYNC); + xfs_ilock(ip, XFS_ILOCK_EXCL); + xfs_iflock(ip); + } + error = xfs_iflush(ip, sync_mode); /* * If we hit an error, typically because of filesystem --- fs/xfs/xfs_vnodeops.h_1.4 2007-10-04 14:53:13.000000000 +1000 +++ fs/xfs/xfs_vnodeops.h 2007-10-04 18:11:51.000000000 +1000 @@ -80,4 +80,6 @@ int xfs_flushinval_pages(struct xfs_inod int xfs_flush_pages(struct xfs_inode *ip, xfs_off_t first, xfs_off_t last, uint64_t flags, int fiopt); +int xfs_log_inode(struct xfs_inode *ip, int sync); + #endif /* _XFS_VNODEOPS_H */ --------------070405030101040701060405-- From owner-xfs@oss.sgi.com Fri Nov 23 03:18:35 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 23 Nov 2007 03:18:44 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from c2beaimr04.btconnect.com (c2beaimr04.btconnect.com [213.123.26.158]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lANBIWs0000966 for ; Fri, 23 Nov 2007 03:18:34 -0800 Received: from localhost (localhost) by c2beaimr04.btconnect.com with internal id ALY03766; Fri, 23 Nov 2007 11:07:30 GMT Date: Fri, 23 Nov 2007 11:07:30 GMT From: Mail Delivery Subsystem Message-Id: <200711231107.ALY03766@c2beaimr04.btconnect.com> To: linux-xfs@oss.sgi.com MIME-Version: 1.0 Content-Type: multipart/report; report-type=delivery-status; boundary="ALY03766.1195816050/c2beaimr04.btconnect.com" Subject: Returned mail: Cannot send message within 1 day Auto-Submitted: auto-generated (failure) X-DSN-Junkmail-Status: score=10/50, host=c2beaimr04.btconnect.com X-DSN-Mirapoint-Virus: VIRUSDELETED; host=c2beaimr04.btconnect.com; attachment=[2.2]; virus=W32/MyDoom-O X-Virus-Scanned: ClamAV 0.91.2/4890/Fri Nov 23 02:34:41 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13760 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: MAILER-DAEMON@c2beaimr04.btconnect.com Precedence: bulk X-list: xfs This is a MIME-encapsulated message --ALY03766.1195816050/c2beaimr04.btconnect.com The original message was received at Thu, 22 Nov 2007 10:42:41 GMT from ip-77-242-30-138.net.abissnet.al [77.242.30.138] (may be forged) ----- The following addresses had permanent delivery errors ----- ----- Transcript of session is unavailable ----- --ALY03766.1195816050/c2beaimr04.btconnect.com Content-Type: message/delivery-status Reporting-MTA: dns; c2beaimr04.btconnect.com Arrival-Date: Thu, 22 Nov 2007 10:42:41 GMT Final-Recipient: RFC822; richard@officeequipmentuk.co.uk Action: failed Status: 4.4.7 Remote-MTA: DNS; officeequipmentuk.co.uk Last-Attempt-Date: Fri, 23 Nov 2007 11:07:30 GMT --ALY03766.1195816050/c2beaimr04.btconnect.com Content-Type: message/rfc822 Received: from oss.sgi.com (ip-77-242-30-138.net.abissnet.al [77.242.30.138] (may be forged)) by c2beaimr04.btconnect.com with ESMTP id ALS42610; Thu, 22 Nov 2007 10:42:32 GMT Message-Id: <200711221042.ALS42610@c2beaimr04.btconnect.com> From: linux-xfs@oss.sgi.com To: richard@officeequipmentuk.co.uk Subject: Delivery reports about your e-mail Date: Wed, 21 Nov 2040 11:59:42 +0100 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2600.0000 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 X-Mirapoint-Virus: VIRUSDELETED; host=c2beaimr04.btconnect.com; attachment=[2.2]; virus=W32/MyDoom-O X-Junkmail-Status: score=10/50, host=c2beaimr04.btconnect.com X-Junkmail-SD-Raw: score=unknown, refid=str=0001.0A0B0201.47455B26.024B,ss=1,fgs=0, ip=77.242.30.138, so=2006-12-09 10:45:40, dmn=5.4.3/2007-10-18 --ALY03766.1195816050/c2beaimr04.btconnect.com-- From owner-xfs@oss.sgi.com Fri Nov 23 04:01:09 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 23 Nov 2007 04:01:14 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from one.firstfloor.org (one.firstfloor.org [213.235.205.2]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lANC17Uf006507 for ; Fri, 23 Nov 2007 04:01:09 -0800 Received: by one.firstfloor.org (Postfix, from userid 503) id 54E0D18902A8; Fri, 23 Nov 2007 13:01:15 +0100 (CET) Date: Fri, 23 Nov 2007 13:01:15 +0100 From: Andi Kleen To: David Chinner Cc: Andi Kleen , Stewart Smith , xfs-oss , lkml Subject: Re: [PATCH 2/9]: Reduce Log I/O latency Message-ID: <20071123120115.GA18532@one.firstfloor.org> References: <20071122003339.GH114266761__34694.2978365861$1195691722$gmane$org@sgi.com> <20071122011214.GR114266761@sgi.com> <1195702123.8369.78.camel@localhost.localdomain> <20071122120611.GA3573@one.firstfloor.org> <20071122131539.GX114266761@sgi.com> <20071123025317.GA12257@one.firstfloor.org> <20071123040329.GB114266761@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071123040329.GB114266761@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4890/Fri Nov 23 02:34:41 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13761 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: andi@firstfloor.org Precedence: bulk X-list: xfs On Fri, Nov 23, 2007 at 03:03:29PM +1100, David Chinner wrote: > On Fri, Nov 23, 2007 at 03:53:17AM +0100, Andi Kleen wrote: > > On Fri, Nov 23, 2007 at 12:15:39AM +1100, David Chinner wrote: > > > On Thu, Nov 22, 2007 at 01:06:11PM +0100, Andi Kleen wrote: > > > > > FWIW from a "real time" database POV this seems to make sense to me... > > > > > in fact, we probably rely on filesystem metadata way too much > > > > > (historically it's just "worked".... although we do seem to get issues > > > > > on ext3). > > > > > > > > For that case you really would need priority inheritance: any metadata > > > > IO on behalf or blocking a process needs to use the process' block IO > > > > priority. > > > > > > How do you do that when the processes are blocking on semaphores, > > > mutexes or rw-semaphores in the fileysystem three layers removed from > > > the I/O in progress? > > > > [...] I didn't say it was easy (or rather explicitely said it would be tricky). > > Probably it would be possible to fold it somehow into rt mutexes PI, > > but it's not easy and semaphores would need to be handled too. > > > > Just my point was to solve the metadata RT problem unconditionally increasing > > the priority is a bad idea and not really a replacement to a "full" > > solution. Short term a user can just increase the priority of all the XFS > > threads anyways. > > The point is that it's not actually a thread-based problem - the priority > can't be inherited via the traditional mutex-like manner. There is no > connection between a thread and an I/o it has already issued and so you > can't transfer a priority from a blocked thread to an issued-but-blocked > i/o.... It could be handled in theory similar to standard CPU priority inheritance -- \ keep track of IO priority of all threads you block and boost your IO priority always to that level. But it would be probably not very easy to do. -Andi From owner-xfs@oss.sgi.com Fri Nov 23 04:24:44 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 23 Nov 2007 04:24:55 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-6.8 required=5.0 tests=AWL,BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.0-r574664 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lANCOaxQ014561 for ; Fri, 23 Nov 2007 04:24:43 -0800 Received: from Relay2.suse.de (mail2.suse.de [195.135.221.8]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx2.suse.de (Postfix) with ESMTP id 17CF22C1D5; Fri, 23 Nov 2007 13:24:42 +0100 (CET) From: Andreas Gruenbacher Organization: SUSE Labs To: Timothy Shimmin Subject: Re: acl and attr: Fix path walking code Date: Fri, 23 Nov 2007 13:24:40 +0100 User-Agent: KMail/1.9.6 (enterprise 20070904.708012) Cc: linux-xfs@oss.sgi.com, Gerald Bringhurst , Brandon Philips References: <200710281858.24428.agruen@suse.de> <473A82E6.50709@sgi.com> <47426C70.3070704@sgi.com> In-Reply-To: <47426C70.3070704@sgi.com> MIME-Version: 1.0 Content-Disposition: inline X-Length: 2744 X-UID: 323 Content-Type: Multipart/Mixed; boundary="Boundary-00=_IasRH48hyGEZElM" Message-Id: <200711231324.40494.agruen@suse.de> X-Virus-Scanned: ClamAV 0.91.2/4890/Fri Nov 23 02:34:41 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13762 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: agruen@suse.de Precedence: bulk X-list: xfs --Boundary-00=_IasRH48hyGEZElM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline On Tuesday 20 November 2007 06:11:12 Timothy Shimmin wrote: > Okay, looked at the code. > > --- > > no -h => stat, getxattr, listxattr > -h => lstat, lgetxattr, llistxattr Yes. > -P => skip symlinks (as soon as see them, then return from place in walk) No, -h never skips symlinks. (But depending on -L and -P, it may not follow symlinks to directories.) Here is an additional comment for do_print, and an equivalent version of the if in there. I hope that this will finally clarify the code. > it would be nicer to have more explanation in the man page. Agreed. How about the attached manpage patches? Thanks, Andreas --Boundary-00=_IasRH48hyGEZElM Content-Type: text/x-diff; charset="iso-8859-1"; name="equivalent-version.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="equivalent-version.diff" Index: attr-2.4.39/getfattr/getfattr.c =================================================================== --- attr-2.4.39.orig/getfattr/getfattr.c +++ attr-2.4.39/getfattr/getfattr.c @@ -355,14 +355,15 @@ int do_print(const char *path, const str return 1; } - /* - * When doing a physical walk or neither doing a logical walk nor processing a - * direct command like argument, do not dereference symlinks. + /* + * Only dereference symlinks when doing a logical walk, or when procesing + * a direct command-line argument while not doing a physical walk. */ if ((walk_flags & WALK_TREE_SYMLINK) && (walk_flags & WALK_TREE_DEREFERENCE) && + !(walk_flags & WALK_TREE_LOGICAL) && ((walk_flags & WALK_TREE_PHYSICAL) || - !(walk_flags & (WALK_TREE_TOPLEVEL | WALK_TREE_LOGICAL)))) + !(walk_flags & (WALK_TREE_TOPLEVEL)))) return 0; if (opt_name) --Boundary-00=_IasRH48hyGEZElM Content-Type: text/x-diff; charset="iso-8859-1"; name="comment.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="comment.diff" Index: attr-2.4.39/getfattr/getfattr.c =================================================================== --- attr-2.4.39.orig/getfattr/getfattr.c +++ attr-2.4.39/getfattr/getfattr.c @@ -355,6 +355,10 @@ int do_print(const char *path, const str return 1; } + /* + * When doing a physical walk or neither doing a logical walk nor processing a + * direct command like argument, do not dereference symlinks. + */ if ((walk_flags & WALK_TREE_SYMLINK) && (walk_flags & WALK_TREE_DEREFERENCE) && ((walk_flags & WALK_TREE_PHYSICAL) || --Boundary-00=_IasRH48hyGEZElM Content-Type: text/x-diff; charset="iso-8859-1"; name="man-acl.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="man-acl.diff" Index: acl-2.2.45/man/man1/getfacl.1 =================================================================== --- acl-2.2.45.orig/man/man1/getfacl.1 +++ acl-2.2.45/man/man1/getfacl.1 @@ -100,13 +100,14 @@ Skip files that only have the base ACL e List the ACLs of all files and directories recursively. .TP .I \-L, \-\-logical -Logical walk, follow symbolic links. The default behavior is to follow -symbolic link arguments, and to skip symbolic links encountered in -subdirectories. +Logical walk, follow symbolic links to directories. The default behavior is to follow +symbolic link arguments, and skip symbolic links encountered in subdirectories. +Only effective in combination with \-R. .TP .I \-P, \-\-physical -Physical walk, skip all symbolic links. This also skips symbolic link -arguments. +Physical walk, do not follow symbolic links to directories. This also skips symbolic +link arguments. +Only effective in combination with \-R. .TP .I \-\-tabular Use an alternative tabular output format. The ACL and the default ACL are displayed side by side. Permissions that are ineffective due to the ACL mask entry are displayed capitalized. The entry tag names for the ACL_USER_OBJ and ACL_GROUP_OBJ entries are also displayed in capital letters, which helps in spotting those entries. Index: acl-2.2.45/man/man1/setfacl.1 =================================================================== --- acl-2.2.45.orig/man/man1/setfacl.1 +++ acl-2.2.45/man/man1/setfacl.1 @@ -104,11 +104,15 @@ Test mode. Instead of changing the ACLs Apply operations to all files and directories recursively. This option cannot be mixed with `\-\-restore'. .TP 4 .I \-L, \-\-logical -Logical walk, follow symbolic links. The default behavior is to follow symbolic link arguments, and to skip symbolic links encountered -in subdirectories. This option cannot be mixed with `\-\-restore'. +Logical walk, follow symbolic links to directories. The default behavior is to follow +symbolic link arguments, and skip symbolic links encountered in subdirectories. +Only effective in combination with \-R. +This option cannot be mixed with `\-\-restore'. .TP 4 .I \-P, \-\-physical -Physical walk, skip all symbolic links. This also skips symbolic link arguments. +Physical walk, do not follow symbolic links to directories. +This also skips symbolic link arguments. +Only effective in combination with \-R. This option cannot be mixed with `\-\-restore'. .TP 4 .I \-\-version --Boundary-00=_IasRH48hyGEZElM Content-Type: text/x-diff; charset="iso-8859-1"; name="man-attr.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="man-attr.diff" Index: attr-2.4.39/man/man1/getfattr.1 =================================================================== --- attr-2.4.39.orig/man/man1/getfattr.1 +++ attr-2.4.39/man/man1/getfattr.1 @@ -56,11 +56,8 @@ while strings encoded as hexidecimal and 0x and 0s, respectively. .TP .BR \-h ", " \-\-no-dereference -Do not follow symlinks. -If -.I pathname -is a symbolic link, the symbolic link itself is examined, -rather than the file the link refers to. +Do not dereference symlinks. Instead of the file a symlink refers to, the +symlink itself is examined. .TP .BR \-m " \f2pattern\f1, " \-\-match "=\f2pattern\f1" Only include attributes with names matching the regular expression @@ -85,13 +82,15 @@ Dump out the extended attribute value(s) List the attributes of all files and directories recursively. .TP .BR \-L ", " \-\-logical -Logical walk, follow symbolic links. +Logical walk, follow symbolic links to directories. The default behaviour is to follow symbolic link arguments, and to skip symbolic links encountered in subdirectories. +Only effective in combination with \-R. .TP .BR \-P ", " \-\-physical -Physical walk, skip all symbolic links. +Physical walk, do not follow symbolic links to directories. This also skips symbolic link arguments. +Only effective in combination with \-R. .TP .B \-\-version Print the version of --Boundary-00=_IasRH48hyGEZElM-- From owner-xfs@oss.sgi.com Fri Nov 23 06:06:57 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 23 Nov 2007 06:07:03 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lANE6sTP030138 for ; Fri, 23 Nov 2007 06:06:57 -0800 Received: from hch by pentafluge.infradead.org with local (Exim 4.63 #1 (Red Hat Linux)) id 1IvYoA-00018l-J3; Fri, 23 Nov 2007 13:43:02 +0000 Date: Fri, 23 Nov 2007 13:43:02 +0000 From: Christoph Hellwig To: Lachlan McIlroy Cc: xfs-dev , xfs-oss Subject: Re: [PATCH] Fix up xfs_buf_associate_memory() Message-ID: <20071123134302.GA4256@infradead.org> References: <47465712.1050000@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <47465712.1050000@sgi.com> User-Agent: Mutt/1.4.2.3i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-Virus-Scanned: ClamAV 0.91.2/4890/Fri Nov 23 02:34:41 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13763 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs On Fri, Nov 23, 2007 at 03:29:06PM +1100, Lachlan McIlroy wrote: > Fixed a few bugs in xfs_buf_associate_memory() including: > > - calculation of 'page_count' was incorrect as it did not > consider the offset of 'mem' into the first page. The > logic to bump 'page_count' didn't work if 'len' was <= > PAGE_CACHE_SIZE (ie offset = 3k, len = 2k). > - setting b_buffer_length to 'len' is incorrect if > 'offset' is > 0. Set it to the total length of the > buffer. > - I suspect that passing a non-aligned address into > mem_to_page() for the first page may have been causing > issues - don't know but just tidy up that code anyway. > > These fixes prevent an data corruption issue that can > occur during log replay. Last time I tried to clean up this gem everything went bezerk, so be aware :) --- fs/xfs/linux-2.6/xfs_buf.c_1.247 2007-11-23 12:03:16.000000000 +1100 +++ fs/xfs/linux-2.6/xfs_buf.c 2007-11-23 12:02:32.000000000 +1100 @@ -726,14 +726,14 @@ xfs_buf_associate_memory( int rval; int i = 0; size_t ptr; + size_t buflen; off_t offset; int page_count; + ptr = (size_t) mem & PAGE_CACHE_MASK; + offset = (off_t) mem - (off_t) ptr; Casting pointers to size_t or off_t makes little sense, these should be unsigned long. And using a variable name of ptr is quite odd :) + while (i < bp->b_page_count) { + bp->b_pages[i++] = mem_to_page((void *)ptr); ptr += PAGE_CACHE_SIZE; } This could be much cleaner written as: for (i = 0; i < bp->b_page_count; i++) { bp->b_pages[i] = mem_to_page((void *)ptr); ptr += PAGE_CACHE_SIZE; } From owner-xfs@oss.sgi.com Fri Nov 23 06:31:09 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 23 Nov 2007 06:31:17 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-0.4 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_42, J_CHICKENPOX_45,J_CHICKENPOX_46,J_CHICKENPOX_47,J_CHICKENPOX_52, J_CHICKENPOX_63,J_CHICKENPOX_66,J_CHICKENPOX_73 autolearn=no version=3.3.0-r574664 Received: from mail.pawisda.de (mail.pawisda.de [213.157.4.156]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lANEV5lW001190 for ; Fri, 23 Nov 2007 06:31:06 -0800 Received: from localhost (localhost.intra.frontsite.de [127.0.0.1]) by mail.pawisda.de (Postfix) with ESMTP id CC41CF64E; Fri, 23 Nov 2007 15:31:08 +0100 (CET) Received: from mail.pawisda.de ([127.0.0.1]) by localhost (ndb [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 24393-07; Fri, 23 Nov 2007 15:30:52 +0100 (CET) Received: from [192.168.51.2] (lw-pc002.intra.frontsite.de [192.168.51.2]) by mail.pawisda.de (Postfix) with ESMTP id 9C8BDF640; Fri, 23 Nov 2007 15:30:52 +0100 (CET) Message-ID: <4746E41C.7090803@linworks.de> Date: Fri, 23 Nov 2007 15:30:52 +0100 From: Ruben Porras User-Agent: Mozilla-Thunderbird 2.0.0.6 (X11/20071009) MIME-Version: 1.0 To: David Chinner Cc: Barry Naujok , "xfs@oss.sgi.com" , xfs-dev Subject: Re: REVIEW: xfs_reno #2 References: <20071120013651.GR995458@sgi.com> In-Reply-To: <20071120013651.GR995458@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV 0.91.2/4890/Fri Nov 23 02:34:41 2007 on oss.sgi.com X-Virus-Scanned: by amavisd-new at pawisda.de X-Virus-Status: Clean X-archive-position: 13764 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: ruben.porras@linworks.de Precedence: bulk X-list: xfs David Chinner schrieb: > On Thu, Oct 04, 2007 at 02:25:16PM +1000, Barry Naujok wrote: > >> A couple changes from the first xfs_reno: >> >> - Major one is that symlinks are now supported, but only >> owner, group and extended attributes are copied for them >> (not times or inode attributes). >> >> - Man page! >> >> >> To make this better, ideally we need some form of >> "swap inodes" function in the kernel, where the entire >> contents of the inode themselves are swapped. This form >> can handle any inode and without any of the dir/file/attr/etc >> copy/swap mechanisms we have in xfs_reno. >> > > Something like the attached patch? > > This is proof-of-concept. I've compiled it but I haven't tested > it. Your mission, Barry, should you choose to accept it, it to > make it work ;) > > Cheers, > > Dave. > Great! Inline are changes to xfs_reno to make it use the new ioctl. It is also a proof-of-concept. I've compiled it but I haven't tested it ;) Now process_(dir|file|slink) do more or less the same, should be a good idea to mix them in one function as I did with them inside the "recover" function. I also did s/sx/si/ on the xfs_swapino_t to make the notation consistent with xfs_swapext_t. Some questions, why do we use ftruncate on files? before moving file inodes, XFS_XFLAG_IMMUTABLE | XFS_XFLAG_APPEND are checked. I extended the check also for directories but not for symlinks. Under which conditions are these flags setted. Can't we skip the test no that we do struct copy of the inode? Have a nice weekend. -- Rubén Porras LinWorks GmbH -- xfs_reno.c | 737 +++++++++++++++++-------------------------------------------- 1 file changed, 212 insertions(+), 525 deletions(-) --- xfs_reno_old.c 2007-11-22 18:55:36.276029053 +0100 +++ xfs_reno.c 2007-11-23 15:00:53.283564811 +0100 @@ -50,30 +50,24 @@ #include #include -#define ATTRBUFSIZE 1024 - #define SCAN_PHASE 0x00 #define DIR_PHASE 0x10 /* nothing done or all done */ -#define DIR_PHASE_1 0x11 /* target dir created */ -#define DIR_PHASE_2 0x12 /* temp dir created */ -#define DIR_PHASE_3 0x13 /* attributes backed up to temp */ -#define DIR_PHASE_4 0x14 /* dirents moved to target dir */ -#define DIR_PHASE_5 0x15 /* attributes applied to target dir */ -#define DIR_PHASE_6 0x16 /* src dir removed */ -#define DIR_PHASE_7 0x17 /* temp dir removed */ -#define DIR_PHASE_MAX 0x17 +#define DIR_PHASE_1 0x11 /* temp dir created */ +#define DIR_PHASE_2 0x12 /* swapped extents and inodes */ +#define DIR_PHASE_3 0x13 /* src dir removed */ +#define DIR_PHASE_MAX 0x13 /* renamed temp to source name */ #define FILE_PHASE 0x20 /* nothing done or all done */ #define FILE_PHASE_1 0x21 /* temp file created */ -#define FILE_PHASE_2 0x22 /* swapped extents */ +#define FILE_PHASE_2 0x22 /* swapped extents and inodes */ #define FILE_PHASE_3 0x23 /* unlinked source */ -#define FILE_PHASE_4 0x24 /* renamed temp to source name */ -#define FILE_PHASE_MAX 0x24 +#define FILE_PHASE_4 0x24 /* hard links copied */ +#define FILE_PHASE_MAX 0x24 /* renamed temp to source name */ #define SLINK_PHASE 0x30 /* nothing done or all done */ #define SLINK_PHASE_1 0x31 /* temp symlink created */ #define SLINK_PHASE_2 0x32 /* symlink attrs copied */ #define SLINK_PHASE_3 0x33 /* unlinked source */ -#define SLINK_PHASE_4 0x34 /* renamed temp to source name */ -#define SLINK_PHASE_MAX 0x34 +#define SLINK_PHASE_4 0x34 /* hard links copied */ +#define SLINK_PHASE_MAX 0x34 /* renamed temp to source name */ static void update_recoverfile(void); #define SET_PHASE(x) (cur_phase = x, update_recoverfile()) @@ -117,7 +111,6 @@ static time_t starttime; static bignode_t *cur_node; static char *cur_target; -static char *cur_temp; static int cur_phase; static int highest_numpaths; static char *recover_file; @@ -189,6 +182,60 @@ err_message(_("Cannot stat %s: %s\n"), s, strerror(errno)); } +static void +err_swapino( + int err, + const char *srcname) +{ + if (log_level >= LOG_DEBUG) { + switch (err) { + case EIO: + err_message(_("Filesystem is going down: %s: %s"), + srcname, strerror(err)); + break; + + default: + err_message(_("Swap inode failed: %s: %s"), + srcname, strerror(err)); + break; + } + } else + err_message(_("Swap inode failed: %s: %s"), + srcname, strerror(err)); +} + +static void +err_swapext( + int err, + const char *srcname, + xfs_off_t bs_size) +{ + if (log_level >= LOG_DEBUG) { + switch (err) { + case ENOTSUP: + err_message("%s: file type not supported", + srcname); + break; + case EFAULT: + /* The file has changed since we started the copy */ + err_message("%s: file modified, " + "inode renumber aborted: %ld", + srcname, bs_size); + break; + case EBUSY: + /* Timestamp has changed or mmap'ed file */ + err_message("%s: file busy", srcname); + break; + default: + err_message(_("Swap extents failed: %s: %s"), + srcname, strerror(errno)); + break; + } + } else + err_message(_("Swap extents failed: %s: %s"), + srcname, strerror(errno)); +} + /* * usage message */ @@ -224,15 +271,15 @@ } static int -xfs_getxattr(int fd, struct fsxattr *attr) +xfs_swapino(int fd, xfs_swapino_t *iu) { - return ioctl(fd, XFS_IOC_FSGETXATTR, attr); + return ioctl(fd, XFS_IOC_SWAPINO, iu); } static int -xfs_setxattr(int fd, struct fsxattr *attr) +xfs_getxattr(int fd, struct fsxattr *attr) { - return ioctl(fd, XFS_IOC_FSSETXATTR, attr); + return ioctl(fd, XFS_IOC_FSGETXATTR, attr); } /* @@ -461,253 +508,19 @@ return 0; } -/* - * Attribute cloning code - most of this is here because attr_copy does not - * let us pick and choose which attributes we want to copy. - */ - -attr_multiop_t attr_ops[ATTR_MAX_MULTIOPS]; - -/* - * Grab attributes specified in attr_ops from source file and write them - * out on the destination file. - */ - -static int -attr_replicate( - char *source, - char *target, - int count) -{ - int j, k; - - if (attr_multi(source, attr_ops, count, ATTR_DONTFOLLOW) < 0) - return -1; - - for (k = 0; k < count; k++) { - if (attr_ops[k].am_error) { - err_message(_("Error %d getting attribute"), - attr_ops[k].am_error); - break; - } - attr_ops[k].am_opcode = ATTR_OP_SET; - } - if (attr_multi(target, attr_ops, k, ATTR_DONTFOLLOW) < 0) - err_message("on attr_multif set"); - for (j = 0; j < k; j++) { - if (attr_ops[j].am_error) { - err_message(_("Error %d setting attribute"), - attr_ops[j].am_error); - return -1; - } - } - - return 0; -} - -/* - * Copy all the attributes specified from src to dst. - */ - -static int -attr_clone_copy( - char *source, - char *target, - char *list_buf, - char *attr_buf, - int buf_len, - int flags) -{ - attrlist_t *alist; - attrlist_ent_t *attr; - attrlist_cursor_t cursor; - int space, i, j; - char *ptr; - - bzero((char *)&cursor, sizeof(cursor)); - do { - if (attr_list(source, list_buf, ATTRBUFSIZE, - flags | ATTR_DONTFOLLOW, &cursor) < 0) { - err_message("on attr_listf"); - return -1; - } - - alist = (attrlist_t *)list_buf; - - space = buf_len; - ptr = attr_buf; - for (j = 0, i = 0; i < alist->al_count; i++) { - attr = ATTR_ENTRY(list_buf, i); - if (space < attr->a_valuelen) { - if (attr_replicate(source, target, j) < 0) - return -1; - j = 0; - space = buf_len; - ptr = attr_buf; - } - attr_ops[j].am_opcode = ATTR_OP_GET; - attr_ops[j].am_attrname = attr->a_name; - attr_ops[j].am_attrvalue = ptr; - attr_ops[j].am_length = (int) attr->a_valuelen; - attr_ops[j].am_flags = flags; - attr_ops[j].am_error = 0; - j++; - ptr += attr->a_valuelen; - space -= attr->a_valuelen; - } - - log_message(LOG_NITTY, "copying attribute %d", i); - - if (j) { - if (attr_replicate(source, target, j) < 0) - return -1; - } - - } while (alist->al_more); - - return 0; -} - -static int -clone_attribs( - char *source, - char *target) -{ - char list_buf[ATTRBUFSIZE]; - char *attr_buf; - int rval; - - attr_buf = malloc(ATTR_MAX_VALUELEN * 2); - if (attr_buf == NULL) { - err_nomem(); - return -1; - } - rval = attr_clone_copy(source, target, list_buf, attr_buf, - ATTR_MAX_VALUELEN * 2, 0); - if (rval == 0) - rval = attr_clone_copy(source, target, list_buf, attr_buf, - ATTR_MAX_VALUELEN * 2, ATTR_ROOT); - if (rval == 0) - rval = attr_clone_copy(source, target, list_buf, attr_buf, - ATTR_MAX_VALUELEN * 2, ATTR_SECURE); - free(attr_buf); - return rval; -} - -static int -dup_attributes( - char *source, - int sfd, - char *target, - int tfd) -{ - struct stat64 st; - struct timeval tv[2]; - struct fsxattr fsx; - - if (fstat64(sfd, &st) < 0) { - err_stat(source); - return -1; - } - - if (xfs_getxattr(sfd, &fsx) < 0) { - err_stat(source); - return -1; - } - - tv[0].tv_sec = st.st_atim.tv_sec; - tv[0].tv_usec = st.st_atim.tv_nsec / 1000; - tv[1].tv_sec = st.st_mtim.tv_sec; - tv[1].tv_usec = st.st_mtim.tv_nsec / 1000; - - if (futimes(tfd, tv) < 0) - err_message(_("%s: Cannot update target times"), target); - - if (fchown(tfd, st.st_uid, st.st_gid) < 0) { - err_message(_("%s: Cannot change target ownership to " - "uid(%d) gid(%d)"), target, - st.st_uid, st.st_gid); - - if (fchmod(tfd, st.st_mode & ~(S_ISUID | S_ISGID)) < 0) - err_message(_("%s: Cannot change target mode " - "to (%o)"), target, st.st_mode); - } else if (fchmod(tfd, st.st_mode) < 0) - err_message(_("%s: Cannot change target mode to (%o)"), - target, st.st_mode); - - if (xfs_setxattr(tfd, &fsx) < 0) - err_message(_("%s: Cannet set target extended " - "attributes"), target); - - return clone_attribs(source, target); -} - -static int -move_dirents( - char *srcpath, - char *targetpath, - int *move_count) -{ - int rval = 0; - DIR *srcd; - struct dirent64 *dp; - char srcname[PATH_MAX]; - char targetname[PATH_MAX]; - - *move_count = 0; - - srcd = opendir(srcpath); - if (srcd == NULL) { - err_open(srcpath); - return 1; - } - - while ((dp = readdir64(srcd)) != NULL) { - if (dp->d_ino == 0 || !strcmp(dp->d_name, ".") || - !strcmp(dp->d_name, "..")) - continue; - - if (strlen(srcpath) + 1 + strlen(dp->d_name) >= - sizeof(srcname) - 1) { - - err_message(_("%s/%s: Name too long"), srcpath, - dp->d_name); - rval = 1; - goto quit; - } - - sprintf(srcname, "%s/%s", srcpath, dp->d_name); - sprintf(targetname, "%s/%s", targetpath, dp->d_name); - - rval = rename(srcname, targetname); - if (rval != 0) { - err_message(_("failed to rename: \'%s\' to \'%s\'"), - srcname, targetname); - goto quit; - } - - log_message(LOG_DEBUG, "rename %s -> %s", srcname, targetname); - - (*move_count)++; - } - -quit: - closedir(srcd); - return rval; -} - static int process_dir( bignode_t *node) { int sfd = -1; int tfd = -1; - int targetfd = -1; int rval = 0; - int move_count = 0; + struct stat64 st; char *srcname = NULL; char *pname = NULL; - struct stat64 s1; + xfs_swapino_t si; + xfs_swapext_t sx; + xfs_bstat_t bstatbuf; struct fsxattr fsx; char target[PATH_MAX] = ""; @@ -718,14 +531,19 @@ cur_node = node; srcname = node->paths[0]; - if (stat64(srcname, &s1) < 0) { + bzero(&st, sizeof(st)); + bzero(&bstatbuf, sizeof(bstatbuf)); + bzero(&si, sizeof(si)); + bzero(&sx, sizeof(sx)); + + if (stat64(srcname, &st) < 0) { if (errno != ENOENT) { err_stat(srcname); global_rval |= 2; } goto quit; } - if (s1.st_ino <= XFS_MAXINUMBER_32 && !force_all) { + if (st.st_ino <= XFS_MAXINUMBER_32 && !force_all) { /* * This directory has already changed ino's, probably due * to being moved during processing of a parent directory. @@ -737,7 +555,7 @@ rval = 1; sfd = open(srcname, O_RDONLY); - if (sfd < 0) { + if (sfd == -1) { err_open(srcname); goto quit; } @@ -754,7 +572,12 @@ if (fsx.fsx_xflags & (XFS_XFLAG_IMMUTABLE | XFS_XFLAG_APPEND)) { err_message(_("%s: immutable/append, ignoring"), srcname); global_rval |= 2; - rval = 0; + goto quit; + } + + if (realuid != 0 && realuid != st.st_uid) { + errno = EACCES; + err_open(srcname); goto quit; } @@ -770,7 +593,11 @@ err_message(_("Unable to create directory copy: %s"), srcname); goto quit; } - SET_PHASE(DIR_PHASE_1); + tfd = open(target, O_RDONLY); + if (tfd == -1) { + err_open(target); + goto quit; + } cur_target = strdup(target); if (!cur_target) { @@ -778,81 +605,64 @@ goto quit; } - sprintf(target, "%s/%sXXXXXX", pname, cmd_prefix); - if (mkdtemp(target) == NULL) { - err_message(_("unable to create tmp directory copy")); - goto quit; - } - SET_PHASE(DIR_PHASE_2); + SET_PHASE(DIR_PHASE_1); - cur_temp = strdup(target); - if (!cur_temp) { - err_nomem(); - goto quit; - } + /* swapino src target */ + si.si_version = XFS_SI_VERSION; + si.si_fdtarget = tfd; + si.si_fdtmp = sfd; - tfd = open(cur_temp, O_RDONLY); - if (tfd < 0) { - err_open(cur_temp); - goto quit; + /* swap the inodes */ + rval = xfs_swapino(tfd, &si); + if (rval < 0) { + err_swapino(rval, srcname); + goto quit_unlink; } - targetfd = open(cur_target, O_RDONLY); - if (tfd < 0) { - err_open(cur_target); + if (xfs_bulkstat_single(sfd, &st.st_ino, &bstatbuf) < 0) { + err_message(_("unable to bulkstat source file: %s"), + srcname); + unlink(target); goto quit; } - - /* copy timestamps, attribs and EAs, to cur_temp */ - rval = dup_attributes(srcname, sfd, cur_temp, tfd); - if (rval != 0) { - err_message(_("unable to duplicate directory attributes: %s"), + if (bstatbuf.bs_ino != st.st_ino) { + err_message(_("bulkstat of source file returned wrong inode: %s"), srcname); - goto quit_unlink; + unlink(target); + goto quit; } - SET_PHASE(DIR_PHASE_3); - - /* move src dirents to cur_target (this changes timestamps on src) */ - rval = move_dirents(srcname, cur_target, &move_count); - if (rval != 0) { - err_message(_("unable to move directory contents: %s to %s"), - srcname, cur_target); - /* uh oh, move everything back... */ - if (move_count > 0) - goto quit_undo; - } + ftruncate64(tfd, bstatbuf.bs_size); - SET_PHASE(DIR_PHASE_4); + /* swapextents src target */ + sx.sx_stat = bstatbuf; /* struct copy */ + sx.sx_version = XFS_SX_VERSION; + sx.sx_fdtarget = sfd; + sx.sx_fdtmp = tfd; + sx.sx_offset = 0; + sx.sx_length = bstatbuf.bs_size; - /* copy timestamps, attribs and EAs from cur_temp to cur_target */ - rval = dup_attributes(cur_temp, tfd, cur_target, targetfd); - if (rval != 0) { - err_message(_("unable to duplicate directory attributes: %s"), - cur_temp); + /* Swap the extents */ + rval = xfs_swapext(sfd, &sx); + if (rval < 0) { + err_swapext(rval, srcname, bstatbuf.bs_size); goto quit_unlink; } - SET_PHASE(DIR_PHASE_5); + SET_PHASE(DIR_PHASE_2); /* rmdir src */ rval = rmdir(srcname); if (rval != 0) { err_message(_("unable to remove directory: %s"), srcname); - goto quit_undo; + goto quit; } - SET_PHASE(DIR_PHASE_6); - - rval = rmdir(cur_temp); - if (rval != 0) - err_message(_("unable to remove tmp directory: %s"), cur_temp); - - SET_PHASE(DIR_PHASE_7); + SET_PHASE(DIR_PHASE_3); /* rename cur_target src */ - rval = rename(cur_target, srcname); + rval = rename(target, srcname); if (rval != 0) { /* * we can't abort since the src dir is now gone. @@ -863,18 +673,10 @@ } goto quit; - quit_undo: - if (move_dirents(cur_target, srcname, &move_count) != 0) { - /* oh, dear lord... let the admin clean this one up */ - err_message(_("unable to move directory contents back: %s to %s"), - cur_target, srcname); - goto quit; - } - SET_PHASE(DIR_PHASE_3); - quit_unlink: - rmdir(cur_target); - rmdir(cur_temp); + rval = rmdir(target); + if (rval != 0) + err_message(_("unable to remove directory: %s"), target); quit: @@ -884,16 +686,13 @@ close(sfd); if (tfd >= 0) close(tfd); - if (targetfd >= 0) - close(targetfd); free(pname); free(cur_target); - free(cur_temp); cur_target = NULL; - cur_temp = NULL; cur_node = NULL; + numdirsdone++; return rval; } @@ -906,9 +705,10 @@ int tfd = -1; int i = 0; int rval = 0; - struct stat64 s1; + struct stat64 st; char *srcname = NULL; char *pname = NULL; + xfs_swapino_t si; xfs_swapext_t sx; xfs_bstat_t bstatbuf; struct fsxattr fsx; @@ -921,37 +721,36 @@ cur_node = node; srcname = node->paths[0]; - bzero(&s1, sizeof(s1)); + bzero(&st, sizeof(st)); bzero(&bstatbuf, sizeof(bstatbuf)); + bzero(&si, sizeof(si)); bzero(&sx, sizeof(sx)); - if (stat64(srcname, &s1) < 0) { + if (stat64(srcname, &st) < 0) { if (errno != ENOENT) { err_stat(srcname); global_rval |= 2; } goto quit; } - if (s1.st_ino <= XFS_MAXINUMBER_32 && !force_all) + if (st.st_ino <= XFS_MAXINUMBER_32 && !force_all) /* this file has changed, and no longer needs processing */ goto quit; + rval = 1; /* open and sync source */ sfd = open(srcname, O_RDWR | O_DIRECT); if (sfd < 0) { err_open(srcname); - rval = 1; goto quit; } if (!platform_test_xfs_fd(sfd)) { err_not_xfs(srcname); - rval = 1; goto quit; } if (fsync(sfd) < 0) { err_message(_("sync failed: %s: %s"), srcname, strerror(errno)); - rval = 1; goto quit; } @@ -963,7 +762,7 @@ * but before all reads have completed to block xfs_reno reads. * This change just closes the window a bit. */ - if ((s1.st_mode & S_ISGID) && !(s1.st_mode & S_IXGRP)) { + if ((st.st_mode & S_ISGID) && !(st.st_mode & S_IXGRP)) { struct flock fl; fl.l_type = F_RDLCK; @@ -988,7 +787,6 @@ if (xfs_getxattr(sfd, &fsx) < 0) { err_message(_("failed to get inode attrs: %s"), srcname); - rval = 1; goto quit; } if (fsx.fsx_xflags & (XFS_XFLAG_IMMUTABLE | XFS_XFLAG_APPEND)) { @@ -997,9 +795,7 @@ goto quit; } - rval = 1; - - if (realuid != 0 && realuid != s1.st_uid) { + if (realuid != 0 && realuid != st.st_uid) { errno = EACCES; err_open(srcname); goto quit; @@ -1012,9 +808,10 @@ goto quit; } dirname(pname); + sprintf(target, "%s/%sXXXXXX", pname, cmd_prefix); tfd = mkstemp(target); - if (tfd < 0) { + if (tfd == -1) { err_message("unable to create file copy"); goto quit; } @@ -1026,30 +823,26 @@ SET_PHASE(FILE_PHASE_1); - /* Setup direct I/O */ - if (fcntl(tfd, F_SETFL, O_DIRECT) < 0 ) { - err_message(_("could not set O_DIRECT for %s on tmp: %s"), - srcname, target); - unlink(target); - goto quit; - } + /* swapino src target */ + si.si_version = XFS_SI_VERSION; + si.si_fdtarget = sfd; + si.si_fdtmp = tfd; - /* copy attribs & EAs to target */ - if (dup_attributes(srcname, sfd, target, tfd) != 0) { - err_message(_("unable to duplicate file attributes: %s"), - srcname); - unlink(target); - goto quit; + /* swap the inodes */ + rval = xfs_swapino(sfd, &si); + if (rval < 0) { + err_swapino(rval, srcname); + goto quit_unlink; } - if (xfs_bulkstat_single(sfd, &s1.st_ino, &bstatbuf) < 0) { + if (xfs_bulkstat_single(sfd, &st.st_ino, &bstatbuf) < 0) { err_message(_("unable to bulkstat source file: %s"), srcname); unlink(target); goto quit; } - if (bstatbuf.bs_ino != s1.st_ino) { + if (bstatbuf.bs_ino != st.st_ino) { err_message(_("bulkstat of source file returned wrong inode: %s"), srcname); unlink(target); @@ -1069,44 +862,8 @@ /* Swap the extents */ rval = xfs_swapext(sfd, &sx); if (rval < 0) { - if (log_level >= LOG_DEBUG) { - switch (errno) { - case ENOTSUP: - err_message("%s: file type not supported", - srcname); - break; - case EFAULT: - /* The file has changed since we started the copy */ - err_message("%s: file modified, " - "inode renumber aborted: %ld", - srcname, bstatbuf.bs_size); - break; - case EBUSY: - /* Timestamp has changed or mmap'ed file */ - err_message("%s: file busy", srcname); - break; - default: - err_message(_("Swap extents failed: %s: %s"), - srcname, strerror(errno)); - break; - } - } else - err_message(_("Swap extents failed: %s: %s"), - srcname, strerror(errno)); - goto quit; - } - - if (bstatbuf.bs_dmevmask | bstatbuf.bs_dmstate) { - struct fsdmidata fssetdm; - - /* Set the DMAPI Fields. */ - fssetdm.fsd_dmevmask = bstatbuf.bs_dmevmask; - fssetdm.fsd_padding = 0; - fssetdm.fsd_dmstate = bstatbuf.bs_dmstate; - - if (ioctl(tfd, XFS_IOC_FSSETDM, (void *)&fssetdm ) < 0) - err_message(_("attempt to set DMI attributes " - "of %s failed"), target); + err_swapext(rval, srcname, bstatbuf.bs_size); + goto quit_unlink; } SET_PHASE(FILE_PHASE_2); @@ -1152,8 +909,12 @@ numfilesdone++; } + quit_unlink: + rval = unlink(target); + if (rval != 0) + err_message(_("unable to remove file: %s"), target); + quit: - cur_node = NULL; SET_PHASE(FILE_PHASE); @@ -1166,6 +927,7 @@ free(cur_target); cur_target = NULL; + cur_node = NULL; numfilesdone++; return rval; @@ -1177,12 +939,15 @@ bignode_t *node) { int i = 0; + int sfd = -1; + int tfd = -1; int rval = 0; struct stat64 st; char *srcname = NULL; char *pname = NULL; char target[PATH_MAX] = ""; char linkbuf[PATH_MAX]; + xfs_swapino_t si; SET_PHASE(SLINK_PHASE); @@ -1191,6 +956,9 @@ cur_node = node; srcname = node->paths[0]; + bzero(&st, sizeof(st)); + bzero(&si, sizeof(si)); + if (lstat64(srcname, &st) < 0) { if (errno != ENOENT) { err_stat(srcname); @@ -1204,6 +972,13 @@ rval = 1; + /* open source */ + sfd = open(srcname, O_RDWR | O_DIRECT); + if (sfd < 0) { + err_open(srcname); + goto quit; + } + i = readlink(srcname, linkbuf, sizeof(linkbuf) - 1); if (i < 0) { err_message(_("unable to read symlink: %s"), srcname); @@ -1226,7 +1001,8 @@ dirname(pname); sprintf(target, "%s/%sXXXXXX", pname, cmd_prefix); - if (mktemp(target) == NULL) { + tfd = mkstemp(target); + if (tfd == -1) { err_message(_("unable to create temp symlink name")); goto quit; } @@ -1243,19 +1019,15 @@ SET_PHASE(SLINK_PHASE_1); - /* copy ownership & EAs to target */ - if (lchown(target, st.st_uid, st.st_gid) < 0) { - err_message(_("%s: Cannot change target ownership to " - "uid(%d) gid(%d)"), target, - st.st_uid, st.st_gid); - unlink(target); - goto quit; - } + /* swapino src target */ + si.si_version = XFS_SI_VERSION; + si.si_fdtarget = sfd; + si.si_fdtmp = tfd; - if (clone_attribs(srcname, target) != 0) { - err_message(_("unable to duplicate symlink attributes: %s"), - srcname); - unlink(target); + /* swap the inodes */ + rval = xfs_swapino(sfd, &si); + if (rval < 0) { + err_swapino(rval, srcname); goto quit; } @@ -1374,8 +1146,8 @@ for (i = 0; i < cur_node->numpaths; i++) len += sprintf(buf + len, "%s\n", cur_node->paths[i]); - len += sprintf(buf + len, "target: %s\ntemp: %s\nend\n", - cur_target, cur_temp); +/* len += sprintf(buf + len, "target: %s\ntemp: %s\nend\n", */ +/* cur_target, cur_temp); */ ASSERT(len < buf_size); @@ -1468,7 +1240,7 @@ ino_t ino; int ftw_flags; char buf[PATH_MAX + 10]; /* path + "target: " */ - struct stat64 s; + struct stat64 st; int first_path; /* @@ -1543,12 +1315,12 @@ log_message(LOG_DEBUG, "path: '%s'", buf); if (buf[0] == '/') { - if (stat64(buf, &s) < 0) { + if (stat64(buf, &st) < 0) { err_message(_("Recovery failed: cannot " "stat '%s'"), buf); goto quit; } - if (s.st_ino != ino) { + if (st.st_ino != ino) { err_message(_("Recovery failed: inode " "number for '%s' does not " "match recorded number"), buf); @@ -1569,7 +1341,7 @@ err_nomem(); goto quit; } - if (stat64(*target, &s) < 0) { + if (stat64(*target, &st) < 0) { err_message(_("Recovery failed: cannot " "stat '%s'"), *target); goto quit; @@ -1619,12 +1391,10 @@ char *tname, int phase) { - int tfd = -1; - int targetfd = -1; char *srcname = NULL; int rval = 0; int i; - int move_count = 0; + int dir; dump_node("recover", node); log_message(LOG_DEBUG, "target: %s, phase: %x", target, phase); @@ -1632,137 +1402,54 @@ if (node) srcname = node->paths[0]; + dir = (phase < DIR_PHASE || phase > DIR_PHASE_MAX); + switch (phase) { - case DIR_PHASE_2: -rmtemps: - log_message(LOG_NORMAL, _("Removing temporary directory: '%s'"), - tname); - if (rmdir(tname) < 0 && errno != ENOENT) { - err_message(_("unable to remove directory: %s"), tname); - rval = 1; - } - /* FALL THRU */ case DIR_PHASE_1: - log_message(LOG_NORMAL, _("Removing target directory: '%s'"), - target); - if (rmdir(target) < 0 && errno != ENOENT) { - err_message(_("unable to remove directory: %s"), - target); - rval = 1; - } - break; + case FILE_PHASE_1: + case SLINK_PHASE_1: + log_message(LOG_NORMAL, _("Unlinking temporary %s: \'%s\'"), + dir ? "directory" : "file", target); - case DIR_PHASE_3: - log_message(LOG_NORMAL, _("Completing moving directory " - "contents: '%s' to '%s'"), srcname, target); - if (move_dirents(srcname, target, &move_count) != 0) { - err_message(_("unable to move directory contents: " - "%s to %s"), srcname, target); - /* uh oh, move everything back... */ - if (move_count > 0) { - if (move_dirents(target, srcname, - &move_count) != 0) { - /* oh, dear lord... let the admin - * clean this one up */ - err_message(_("unable to move directory " - "contents back: %s to %s"), - target, srcname); - exit(1); - } - } - goto rmtemps; - } - /* FALL THRU */ - case DIR_PHASE_4: - log_message(LOG_NORMAL, _("Setting attributes for target " - "directory: \'%s\'"), target); - tfd = open(tname, O_RDONLY); - if (tfd < 0) { - err_open(tname); - rval = 1; - break; - } - targetfd = open(target, O_RDONLY); - if (targetfd < 0) { - err_open(target); - rval = 1; - break; - } - rval = dup_attributes(tname, tfd, target, targetfd); - if (rval != 0) { - err_message(_("unable to duplicate directory " - "attributes: %s"), tname); - break; - } - close(tfd); - close(targetfd); - /* FALL THRU */ - case DIR_PHASE_6: - log_message(LOG_NORMAL, _("Removing temporary directory: \'%s\'"), - tname); - if (rmdir(tname) < 0 && errno != ENOENT) { - err_message(_("unable to remove directory: %s"), - tname); - rval = 1; - break; - } - /* FALL THRU */ - case DIR_PHASE_5: - log_message(LOG_NORMAL, _("Removing old directory: \'%s\'"), - srcname); - if (rmdir(srcname) < 0 && errno != ENOENT) { - err_message(_("unable to remove directory: %s"), - srcname); - rval = 1; - break; - } - /* FALL THRU */ - case DIR_PHASE_7: - log_message(LOG_NORMAL, _("Renaming new directory to old " - "directory: \'%s\' -> \'%s\'"), target, srcname); - rval = rename(target, srcname); - if (rval != 0) { - /* we can't abort since the src dir is now gone. - * let the admin clean this one up - */ - err_message(_("unable to rename directory: %s to %s"), - target, srcname); - break; - } - break; + rval = dir ? rmdir(target) : unlink(target); + if ( rval < 0 && errno != ENOENT) + err_message(_("unable to remove %s: %s"), + dir ? "directory" : "file", target); - case FILE_PHASE_1: - case SLINK_PHASE_1: - log_message(LOG_NORMAL, _("Unlinking temporary file: \'%s\'"), - target); - unlink(target); break; + case DIR_PHASE_2: case FILE_PHASE_2: case SLINK_PHASE_2: - log_message(LOG_NORMAL, _("Unlinking old file: \'%s\'"), - srcname); - rval = unlink(srcname); - if (rval != 0) { - err_message(_("unable to remove file: %s"), srcname); + log_message(LOG_NORMAL, _("Unlinking old %s: \'%s\'"), + dir ? "directory" : "file", srcname); + + rval = dir ? rmdir(target) : unlink(srcname); + + if (rval < 0 && errno != ENOENT) { + err_message(_("unable to remove %s: %s"), + dir ? "directory" : "file", srcname); break; } /* FALL THRU */ + case DIR_PHASE_3: case FILE_PHASE_3: case SLINK_PHASE_3: - log_message(LOG_NORMAL, _("Renaming new file to old file: " + log_message(LOG_NORMAL, _("Renaming: " "\'%s\' -> \'%s\'"), target, srcname); rval = rename(target, srcname); if (rval != 0) { /* we can't abort since the src file is now gone. * let the admin clean this one up */ - err_message(_("unable to rename file: %s to %s"), + err_message(_("unable to rename: %s to %s"), target, srcname); break; } + if (dir) + break; /* FALL THRU */ case FILE_PHASE_4: case SLINK_PHASE_4: From owner-xfs@oss.sgi.com Fri Nov 23 09:37:19 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 23 Nov 2007 09:37:24 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lANHbFjv029158 for ; Fri, 23 Nov 2007 09:37:19 -0800 Received: from hch by pentafluge.infradead.org with local (Exim 4.63 #1 (Red Hat Linux)) id 1IvcSx-0003Bc-AB; Fri, 23 Nov 2007 17:37:23 +0000 Date: Fri, 23 Nov 2007 17:37:23 +0000 From: Christoph Hellwig To: David Chinner Cc: xfs-oss , lkml Subject: Re: [PATCH 3/9] Use _META bio I/O types for metadata I/O Message-ID: <20071123173723.GA12227@infradead.org> References: <20071122003512.GI114266761@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071122003512.GI114266761@sgi.com> User-Agent: Mutt/1.4.2.3i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-Virus-Scanned: ClamAV 0.91.2/4890/Fri Nov 23 02:34:41 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13765 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs On Thu, Nov 22, 2007 at 11:35:12AM +1100, David Chinner wrote: > Improve metadata I/O merging in the elevator > > Change all async metadata buffers to use [READ|WRITE]_META I/O types > so that the I/O doesn't get issued immediately. This allows merging > of adjacent metadata requests but still prioritises them over bulk > data. This shows a 10-15% improvement in sequential create speed of > small files. > > Don't include the log buffers in this classification - leave them > as sync types so they are issued immediately. Looks good, and just including the trivial fs.h addition here might be okay aswell. From owner-xfs@oss.sgi.com Fri Nov 23 09:40:07 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 23 Nov 2007 09:40:12 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lANHe5h4029813 for ; Fri, 23 Nov 2007 09:40:07 -0800 Received: from hch by pentafluge.infradead.org with local (Exim 4.63 #1 (Red Hat Linux)) id 1IvcVh-0003Da-53; Fri, 23 Nov 2007 17:40:13 +0000 Date: Fri, 23 Nov 2007 17:40:13 +0000 From: Christoph Hellwig To: David Chinner Cc: xfs-oss , lkml Subject: Re: [PATCH 4/9] Factor common inode cluster buffer lookup code Message-ID: <20071123174013.GB12227@infradead.org> References: <20071122003642.GJ114266761@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071122003642.GJ114266761@sgi.com> User-Agent: Mutt/1.4.2.3i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-Virus-Scanned: ClamAV 0.91.2/4890/Fri Nov 23 02:34:41 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13766 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs On Thu, Nov 22, 2007 at 11:36:42AM +1100, David Chinner wrote: > +STATIC int > +xfs_ino_to_imap( > + xfs_mount_t *mp, > + xfs_trans_t *tp, > + xfs_ino_t ino, > + xfs_imap_t *imap, > + uint imap_flags) > +{ > + int error; > + > + error = xfs_imap(mp, tp, ino, imap, imap_flags); > + if (error) { > + cmn_err(CE_WARN, "xfs_ino_to_imap: xfs_imap() returned an " > + "error %d on %s. Returning error.", > + error, mp->m_fsname); > + return error; > + } > + > + /* > + * If the inode number maps to a block outside the bounds > + * of the file system then return NULL rather than calling > + * read_buf and panicing when we get an error from the > + * driver. > + */ > + if ((imap->im_blkno + imap->im_len) > > + XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)) { > + xfs_fs_cmn_err(CE_ALERT, mp, "xfs_ino_to_imap: " > + "(imap->im_blkno (0x%llx) + imap->im_len (0x%llx)) > " > + " XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks) (0x%llx)", > + (unsigned long long) imap->im_blkno, > + (unsigned long long) imap->im_len, > + XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)); > + return XFS_ERROR(EINVAL); > + } What about just adding this verification to xfs_imap instead of creating this wrapper for two of it's three callers? Otherwise this patch looks fine to me. From owner-xfs@oss.sgi.com Fri Nov 23 09:50:47 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 23 Nov 2007 09:50:54 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lANHojw3031319 for ; Fri, 23 Nov 2007 09:50:47 -0800 Received: from hch by pentafluge.infradead.org with local (Exim 4.63 #1 (Red Hat Linux)) id 1Ivcg1-0003KI-5L; Fri, 23 Nov 2007 17:50:53 +0000 Date: Fri, 23 Nov 2007 17:50:53 +0000 From: Christoph Hellwig To: David Chinner Cc: xfs-oss , lkml Subject: Re: [PATCH 5/9] Don't block pdflush when flushing inodes Message-ID: <20071123175053.GA12649@infradead.org> References: <20071122003817.GK114266761@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071122003817.GK114266761@sgi.com> User-Agent: Mutt/1.4.2.3i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-Virus-Scanned: ClamAV 0.91.2/4890/Fri Nov 23 02:34:41 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13767 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs > +++ 2.6.x-xfs-new/fs/xfs/xfs_inode.c 2007-11-22 10:33:51.037704348 +1100 > @@ -183,12 +183,20 @@ xfs_imap_to_bp( > int ni; > xfs_buf_t *bp; > > + if (buf_flags == 0) > + buf_flags = XFS_BUF_LOCK; There's just two caller and they never pass 0, so this is not needed. > + error = xfs_itobp_flags(mp, NULL, ip, &dip, &bp, 0, 0, > + (noblock) ? XFS_BUF_TRYLOCK : XFS_BUF_LOCK); no need for the braces around noblock. > +int xfs_itobp_flags(struct xfs_mount *, struct xfs_trans *, > xfs_inode_t *, struct xfs_dinode **, struct xfs_buf **, > - xfs_daddr_t, uint); > + xfs_daddr_t, uint, uint); > +#define xfs_itobp(mp, tp, ip, dipp, bpp, bno, iflags) \ > + xfs_itobp_flags(mp, tp, ip, dipp, bpp, bno, iflags, XFS_BUF_LOCK) I'd say just convert xfs_itobp and all it's user to take the additional argument. From owner-xfs@oss.sgi.com Fri Nov 23 09:58:55 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 23 Nov 2007 09:58:59 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lANHwrkT000443 for ; Fri, 23 Nov 2007 09:58:55 -0800 Received: from hch by pentafluge.infradead.org with local (Exim 4.63 #1 (Red Hat Linux)) id 1Ivcnt-0003Pa-NW; Fri, 23 Nov 2007 17:59:01 +0000 Date: Fri, 23 Nov 2007 17:59:01 +0000 From: Christoph Hellwig To: David Chinner Cc: xfs-oss , lkml Subject: Re: [PATCH 6/9] Remove xfs_icluster Message-ID: <20071123175901.GA12866@infradead.org> References: <20071122003952.GL114266761@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071122003952.GL114266761@sgi.com> User-Agent: Mutt/1.4.2.3i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-Virus-Scanned: ClamAV 0.91.2/4890/Fri Nov 23 02:34:41 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13768 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs On Thu, Nov 22, 2007 at 11:39:52AM +1100, David Chinner wrote: > Remove the xfs_icluster structure and replace with a radix tree lookup. > > We don't need to keep a list of inodes in each cluster around anymore > as we can look them up quickly when we need to. The only time we need > to do this now is during inode writeback. > > Factor the inode cluster writeback code out of xfs_iflush and convert > it to use radix_tree_gang_lookup() instead of walking a list of > inodes built when we first read in the inodes. > > This remove 3 pointers from each xfs_inode structure and the xfs_icluster > structure per inode cluster. Hence we reduce the cache footprint of the > xfs_inodes by between 5-10% depending on cluster sparseness. > > To be truly efficient we need a radix_tree_gang_lookup_range() call > to stop searching once we are past the end of the cluster instead > of trying to find a full cluster's worth of inodes. Nice, I like this a lot. I was wondering about something like this already when you put in the radix-tree based inode cache. > +STATIC int > +xfs_iflush_cluster( > + xfs_inode_t *ip, > + xfs_buf_t *bp) > +{ > + xfs_mount_t *mp = ip->i_mount; > + xfs_perag_t *pag = xfs_get_perag(mp, ip->i_ino); > + unsigned long first_index, mask; > + int ilist_size; > + xfs_inode_t *ilist; > + xfs_inode_t *iq; > + xfs_inode_log_item_t *iip; > + int nr_found; > + int clcount = 0; > + int bufwasdelwri; > + > + ASSERT(pag->pagi_inodeok); > + ASSERT(pag->pag_ici_init); > + > + ilist_size = XFS_INODE_CLUSTER_SIZE(mp) * sizeof(xfs_inode_t *); > + ilist = kmem_alloc(ilist_size, KM_MAYFAIL); > + if (!ilist) > + return 0; Now if you just used the linux native allocator this could be a kcalloc :) > + if ((iq->i_update_core == 0) && > + ((iip == NULL) || > + !(iip->ili_format.ilf_fields & XFS_ILOG_ALL)) && > + xfs_ipincount(iq) == 0) { > + continue; > + } if (!iq->i_update_core && (!iip || !(iip->ili_format.ilf_fields & XFS_ILOG_ALL)) && !xfs_ipincount(iq)) continue; > + /* > + * arriving here means that this inode can be flushed. First > + * re-check that it's dirty before flushing. > + */ > + iip = iq->i_itemp; > + if ((iq->i_update_core != 0) || ((iip != NULL) && > + (iip->ili_format.ilf_fields & XFS_ILOG_ALL))) { if (!iq->i_update_core || (!iip && (iip->ili_format.ilf_fields & XFS_ILOG_ALL)) { > + /* > + * Clean up the buffer. If it was B_DELWRI, just release it -- > + * brelse can handle it with no problems. If not, shut down the > + * filesystem before releasing the buffer. > + */ > + bufwasdelwri = XFS_BUF_ISDELAYWRITE(bp); > + if (bufwasdelwri) > + xfs_buf_relse(bp); > + > + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); > + > + if (!bufwasdelwri) { > + /* > + * Just like incore_relse: if we have b_iodone functions, > + * mark the buffer as an error and call them. Otherwise > + * mark it as stale and brelse. > + */ > + if (XFS_BUF_IODONE_FUNC(bp)) { > + XFS_BUF_CLR_BDSTRAT_FUNC(bp); > + XFS_BUF_UNDONE(bp); > + XFS_BUF_STALE(bp); > + XFS_BUF_SHUT(bp); > + XFS_BUF_ERROR(bp,EIO); > + xfs_biodone(bp); > + } else { > + XFS_BUF_STALE(bp); > + xfs_buf_relse(bp); > + } > + } What's the point of all this if the filesystem is shut down anyway? From owner-xfs@oss.sgi.com Fri Nov 23 10:02:33 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 23 Nov 2007 10:02:37 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lANI2V0o001622 for ; Fri, 23 Nov 2007 10:02:32 -0800 Received: from hch by pentafluge.infradead.org with local (Exim 4.63 #1 (Red Hat Linux)) id 1IvcrP-0003T2-HI; Fri, 23 Nov 2007 18:02:39 +0000 Date: Fri, 23 Nov 2007 18:02:39 +0000 From: Christoph Hellwig To: David Chinner Cc: xfs-oss , lkml Subject: Re: [PATCH 9/9] Clean up open coded inode dirty checks Message-ID: <20071123180239.GA13229@infradead.org> References: <20071122004422.GO114266761@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071122004422.GO114266761@sgi.com> User-Agent: Mutt/1.4.2.3i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-Virus-Scanned: ClamAV 0.91.2/4890/Fri Nov 23 02:34:41 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13769 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs > +STATIC_INLINE int xfs_inode_clean(xfs_inode_t *ip) > +{ > + return (((ip->i_itemp == NULL) || > + !(ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL)) && > + (ip->i_update_core == 0)); > +} Can we please get rid of this useless STATIC_INLINE junk? It's really hurting my eyes. As does to a lesser extent the verbose style of this function. This should be something like: static inline int xfs_inode_clean(struct xfs_inode *ip) { return (!ip->i_itemp || !(ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL)) && !ip->i_update_core; } From owner-xfs@oss.sgi.com Fri Nov 23 10:33:57 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 23 Nov 2007 10:34:01 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from sovereign.computergmbh.de (sovereign.computergmbh.de [85.214.69.204]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lANIXsjh006596 for ; Fri, 23 Nov 2007 10:33:56 -0800 Received: by sovereign.computergmbh.de (Postfix, from userid 25121) id D38F61803FD85; Fri, 23 Nov 2007 19:16:08 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by sovereign.computergmbh.de (Postfix) with ESMTP id CAB911C05DF15; Fri, 23 Nov 2007 19:16:08 +0100 (CET) Date: Fri, 23 Nov 2007 19:16:08 +0100 (CET) From: Jan Engelhardt To: Christoph Hellwig cc: David Chinner , xfs-oss , lkml Subject: Re: [PATCH 9/9] Clean up open coded inode dirty checks In-Reply-To: <20071123180239.GA13229@infradead.org> Message-ID: References: <20071122004422.GO114266761@sgi.com> <20071123180239.GA13229@infradead.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.91.2/4890/Fri Nov 23 02:34:41 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13770 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jengelh@computergmbh.de Precedence: bulk X-list: xfs On Nov 23 2007 18:02, Christoph Hellwig wrote: > >> +STATIC_INLINE int xfs_inode_clean(xfs_inode_t *ip) >> +{ >> + return (((ip->i_itemp == NULL) || >> + !(ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL)) && >> + (ip->i_update_core == 0)); >> +} > >Can we please get rid of this useless STATIC_INLINE junk? It's really >hurting my eyes. > >As does to a lesser extent the verbose style of this >function. I have to disagree, but whatever. >static inline int xfs_inode_clean(struct xfs_inode *ip) ^ ^ could be bool - and const >{ > return (!ip->i_itemp || > !(ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL)) && > !ip->i_update_core; >} Perhaps for greater readability: static inline bool xfs_inode_clean(const struct xfs_inode *ip) { if (ip->i_itemp == NULL) return true; if (!(ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL) && ip->i_update_core == NULL) return true; return false; } From owner-xfs@oss.sgi.com Fri Nov 23 12:16:01 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 23 Nov 2007 12:16:11 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from sovereign.computergmbh.de (sovereign.computergmbh.de [85.214.69.204]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lANKFw1t018617 for ; Fri, 23 Nov 2007 12:16:01 -0800 Received: by sovereign.computergmbh.de (Postfix, from userid 25121) id 4E8C41803FD88; Fri, 23 Nov 2007 21:16:05 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by sovereign.computergmbh.de (Postfix) with ESMTP id 4905A1C05DF58; Fri, 23 Nov 2007 21:16:05 +0100 (CET) Date: Fri, 23 Nov 2007 21:16:05 +0100 (CET) From: Jan Engelhardt To: Joe Perches cc: Christoph Hellwig , David Chinner , xfs-oss , lkml Subject: Re: [PATCH 9/9] Clean up open coded inode dirty checks In-Reply-To: <1195847251.4930.21.camel@localhost> Message-ID: References: <20071122004422.GO114266761@sgi.com> <20071123180239.GA13229@infradead.org> <1195847251.4930.21.camel@localhost> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.91.2/4890/Fri Nov 23 02:34:41 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13771 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jengelh@computergmbh.de Precedence: bulk X-list: xfs On Nov 23 2007 11:47, Joe Perches wrote: >On Fri, 2007-11-23 at 19:16 +0100, Jan Engelhardt wrote: >> static inline bool xfs_inode_clean(const struct xfs_inode *ip) >> { >> if (ip->i_itemp == NULL) >> return true; >> if (!(ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL) && >> ip->i_update_core == NULL) >> return true; >> return false; >> } > >Your code changed the test. See - the previous cryptic constructs could not even be decoded ;-) >xfs_inode.i_update_core is an unsigned char. > >I believe reordering the tests to avoid a possibly >unnecessary dereference is better. > > if (ip->i_update_core) > return false; > if (!ip->i_itemp) > return true; > return ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL; Yeah, something like that. Note: the function SHOULD return bool for this, to quash the ilf_fields & XFS_ILOG_ALL into 0/1. From owner-xfs@oss.sgi.com Fri Nov 23 12:51:31 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 23 Nov 2007 12:51:34 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=0.2 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from perches.com (DSL022.labridge.com [206.117.136.22]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lANKpSSe028464 for ; Fri, 23 Nov 2007 12:51:30 -0800 Received: from [192.168.1.128] ([192.168.1.128]) by perches.com (8.9.3/8.9.3) with ESMTP id LAA16163; Fri, 23 Nov 2007 11:58:29 -0800 Subject: Re: [PATCH 9/9] Clean up open coded inode dirty checks From: Joe Perches To: Jan Engelhardt Cc: Christoph Hellwig , David Chinner , xfs-oss , lkml In-Reply-To: References: <20071122004422.GO114266761@sgi.com> <20071123180239.GA13229@infradead.org> Content-Type: text/plain Date: Fri, 23 Nov 2007 11:47:31 -0800 Message-Id: <1195847251.4930.21.camel@localhost> Mime-Version: 1.0 X-Mailer: Evolution 2.12.0-2mdv2008.0 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4890/Fri Nov 23 02:34:41 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13772 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: joe@perches.com Precedence: bulk X-list: xfs On Fri, 2007-11-23 at 19:16 +0100, Jan Engelhardt wrote: > static inline bool xfs_inode_clean(const struct xfs_inode *ip) > { > if (ip->i_itemp == NULL) > return true; > if (!(ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL) && > ip->i_update_core == NULL) > return true; > return false; > } Your code changed the test. xfs_inode.i_update_core is an unsigned char. I believe reordering the tests to avoid a possibly unnecessary dereference is better. if (ip->i_update_core) return false; if (!ip->i_itemp) return true; return ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL; From owner-xfs@oss.sgi.com Sat Nov 24 10:43:54 2007 Received: with ECARTIS (v1.0.0; list xfs); Sat, 24 Nov 2007 10:44:02 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=AWL,BAYES_00, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.0-r574664 Received: from waste.org (waste.org [66.93.16.53]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAOIhpQZ011727 for ; Sat, 24 Nov 2007 10:43:54 -0800 Received: from waste.org (localhost [127.0.0.1]) by waste.org (8.13.8/8.13.8/Debian-3) with ESMTP id lAOIhZ8I002695 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Sat, 24 Nov 2007 12:43:35 -0600 Received: (from oxymoron@localhost) by waste.org (8.13.8/8.13.8/Submit) id lAOIhXMu002684; Sat, 24 Nov 2007 12:43:33 -0600 Date: Sat, 24 Nov 2007 12:43:33 -0600 From: Matt Mackall To: Andi Kleen Cc: David Chinner , Stewart Smith , xfs-oss , lkml Subject: Re: [PATCH 2/9]: Reduce Log I/O latency Message-ID: <20071124184333.GK17536@waste.org> References: <20071122003339.GH114266761__34694.2978365861$1195691722$gmane$org@sgi.com> <20071122011214.GR114266761@sgi.com> <1195702123.8369.78.camel@localhost.localdomain> <20071122120611.GA3573@one.firstfloor.org> <20071122131539.GX114266761@sgi.com> <20071123025317.GA12257@one.firstfloor.org> <20071123040329.GB114266761@sgi.com> <20071123120115.GA18532@one.firstfloor.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071123120115.GA18532@one.firstfloor.org> User-Agent: Mutt/1.5.13 (2006-08-11) X-Virus-Scanned: ClamAV 0.91.2/4902/Sat Nov 24 06:41:20 2007 on oss.sgi.com X-Virus-Scanned: by amavisd-new X-Virus-Status: Clean X-archive-position: 13773 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: mpm@selenic.com Precedence: bulk X-list: xfs On Fri, Nov 23, 2007 at 01:01:15PM +0100, Andi Kleen wrote: > On Fri, Nov 23, 2007 at 03:03:29PM +1100, David Chinner wrote: > > On Fri, Nov 23, 2007 at 03:53:17AM +0100, Andi Kleen wrote: > > > On Fri, Nov 23, 2007 at 12:15:39AM +1100, David Chinner wrote: > > > > On Thu, Nov 22, 2007 at 01:06:11PM +0100, Andi Kleen wrote: > > > > > > FWIW from a "real time" database POV this seems to make sense to me... > > > > > > in fact, we probably rely on filesystem metadata way too much > > > > > > (historically it's just "worked".... although we do seem to get issues > > > > > > on ext3). > > > > > > > > > > For that case you really would need priority inheritance: any metadata > > > > > IO on behalf or blocking a process needs to use the process' block IO > > > > > priority. > > > > > > > > How do you do that when the processes are blocking on semaphores, > > > > mutexes or rw-semaphores in the fileysystem three layers removed from > > > > the I/O in progress? > > > > > > [...] I didn't say it was easy (or rather explicitely said it would be tricky). > > > Probably it would be possible to fold it somehow into rt mutexes PI, > > > but it's not easy and semaphores would need to be handled too. > > > > > > Just my point was to solve the metadata RT problem unconditionally increasing > > > the priority is a bad idea and not really a replacement to a "full" > > > solution. Short term a user can just increase the priority of all the XFS > > > threads anyways. > > > > The point is that it's not actually a thread-based problem - the priority > > can't be inherited via the traditional mutex-like manner. There is no > > connection between a thread and an I/o it has already issued and so you > > can't transfer a priority from a blocked thread to an issued-but-blocked > > i/o.... > > It could be handled in theory similar to standard CPU priority inheritance -- \ > keep track of IO priority of all threads you block and boost your IO priority > always to that level. But it would be probably not very easy to do. Well I think what Dave is saying is that we can't find the related process. The submitter process may have even exited before the flush happens.. You'd instead have to keep track of (the max of) all the submitted I/O segment priorities related to the transaction instead. But I'm sure there are complications. -- Mathematics is the supreme nostalgia of our time. From owner-xfs@oss.sgi.com Sun Nov 25 08:47:45 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 25 Nov 2007 08:47:52 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_33, J_CHICKENPOX_34,J_CHICKENPOX_36,J_CHICKENPOX_66 autolearn=no version=3.3.0-r574664 Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAPGlhXF032142 for ; Sun, 25 Nov 2007 08:47:44 -0800 Received: from hch by pentafluge.infradead.org with local (Exim 4.63 #1 (Red Hat Linux)) id 1IwKN4-0004gz-7s; Sun, 25 Nov 2007 16:30:14 +0000 Date: Sun, 25 Nov 2007 16:30:14 +0000 From: Christoph Hellwig To: Chris Wedgwood Cc: linux-xfs@oss.sgi.com, LKML Subject: [PATCH] xfs: revert to double-buffering readdir Message-ID: <20071125163014.GA17922@infradead.org> References: <20071114070400.GA25708@puku.stupidest.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071114070400.GA25708@puku.stupidest.org> User-Agent: Mutt/1.4.2.3i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-Virus-Scanned: ClamAV 0.91.2/4909/Sun Nov 25 02:49:37 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13774 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs The current readdir implementation deadlocks on a btree buffers locks because nfsd calls back into ->lookup from the filldir callback. The only short-term fix for this is to revert to the old inefficient double-buffering scheme. This patch does exactly that and reverts xfs_file_readdir to what's basically the 2.6.23 version minus the uio and vnops junk. I'll try to find something more optimal for 2.6.25 or at least find a way to use the proper version for local access. Signed-off-by: Christoph Hellwig Index: linux-2.6/fs/xfs/linux-2.6/xfs_file.c =================================================================== --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_file.c 2007-11-25 11:41:20.000000000 +0100 +++ linux-2.6/fs/xfs/linux-2.6/xfs_file.c 2007-11-25 17:14:27.000000000 +0100 @@ -218,6 +218,15 @@ } #endif /* CONFIG_XFS_DMAPI */ +/* + * Unfortunately we can't just use the clean and simple readdir implementation + * below, because nfs might call back into ->lookup from the filldir callback + * and that will deadlock the low-level btree code. + * + * Hopefully we'll find a better workaround that allows to use the optimal + * version at least for local readdirs for 2.6.25. + */ +#if 0 STATIC int xfs_file_readdir( struct file *filp, @@ -249,6 +258,121 @@ return -error; return 0; } +#else + +struct hack_dirent { + int namlen; + loff_t offset; + u64 ino; + unsigned int d_type; + char name[]; +}; + +struct hack_callback { + char *dirent; + size_t len; + size_t used; +}; + +STATIC int +xfs_hack_filldir( + void *__buf, + const char *name, + int namlen, + loff_t offset, + u64 ino, + unsigned int d_type) +{ + struct hack_callback *buf = __buf; + struct hack_dirent *de = (struct hack_dirent *)(buf->dirent + buf->used); + + if (buf->used + sizeof(struct hack_dirent) + namlen > buf->len) + return -EINVAL; + + de->namlen = namlen; + de->offset = offset; + de->ino = ino; + de->d_type = d_type; + memcpy(de->name, name, namlen); + buf->used += sizeof(struct hack_dirent) + namlen; + return 0; +} + +STATIC int +xfs_file_readdir( + struct file *filp, + void *dirent, + filldir_t filldir) +{ + struct inode *inode = filp->f_path.dentry->d_inode; + xfs_inode_t *ip = XFS_I(inode); + struct hack_callback buf; + struct hack_dirent *de; + int error; + loff_t size; + int eof = 0; + xfs_off_t start_offset, curr_offset, offset; + + /* + * Try fairly hard to get memory + */ + buf.len = PAGE_CACHE_SIZE; + do { + buf.dirent = kmalloc(buf.len, GFP_KERNEL); + if (buf.dirent) + break; + buf.len >>= 1; + } while (buf.len >= 1024); + + if (!buf.dirent) + return -ENOMEM; + + curr_offset = filp->f_pos; + if (curr_offset == 0x7fffffff) + offset = 0xffffffff; + else + offset = filp->f_pos; + + while (!eof) { + int reclen; + start_offset = offset; + + buf.used = 0; + error = -xfs_readdir(ip, &buf, buf.len, &offset, + xfs_hack_filldir); + if (error || offset == start_offset) { + size = 0; + break; + } + + size = buf.used; + de = (struct hack_dirent *)buf.dirent; + while (size > 0) { + if (filldir(dirent, de->name, de->namlen, + curr_offset & 0x7fffffff, + de->ino, de->d_type)) { + goto done; + } + + reclen = sizeof(struct hack_dirent) + de->namlen; + size -= reclen; + curr_offset = de->offset /* & 0x7fffffff */; + de = (struct hack_dirent *)((char *)de + reclen); + } + } + + done: + if (!error) { + if (size == 0) + filp->f_pos = offset & 0x7fffffff; + else if (de) + filp->f_pos = curr_offset; + } + + kfree(buf.dirent); + return error; +} +#endif STATIC int xfs_file_mmap( From owner-xfs@oss.sgi.com Sun Nov 25 08:48:28 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 25 Nov 2007 08:48:34 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAPGmQ0o032307 for ; Sun, 25 Nov 2007 08:48:28 -0800 Received: from hch by pentafluge.infradead.org with local (Exim 4.63 #1 (Red Hat Linux)) id 1IwKNq-0004iD-FP; Sun, 25 Nov 2007 16:31:02 +0000 Date: Sun, 25 Nov 2007 16:31:02 +0000 From: Christoph Hellwig To: Christoph Hellwig Cc: xfs@oss.sgi.com Subject: Re: [PATCH] cleanup XFS_IFORK_*/XFS_DFORK* macros Message-ID: <20071125163102.GB17922@infradead.org> References: <20070922102238.GA15732@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070922102238.GA15732@lst.de> User-Agent: Mutt/1.4.2.3i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-Virus-Scanned: ClamAV 0.91.2/4909/Sun Nov 25 02:49:37 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13775 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: xfs On Sat, Sep 22, 2007 at 12:22:38PM +0200, Christoph Hellwig wrote: > (try number three, maybe it manages to get through the list this time) > > Currently XFS_IFORK_* and XFS_DFORK* are implemented by means of > XFS_CFORK* macros. But given that XFS_IFORK_* operates on an > xfs_inode that embedds and xfs_icdinode_core and XFS_DFORK_* operates > on an xfs_dinode that embedds a xfs_dinode_core one will have to do > endian swapping while the other doesn't. Instead of having the current > mess with the CFORK macros that have byteswapping and non-byteswapping > version (which are inconsistantly named while we're at it) just define > each family of the macros to stand by itself and simplify the whole > matter. > > A few direct references to the CFORK variants were cleaned up to > use IFORK or DFORK to make this possible. ping? this is almost two month old now.. > > > Signed-off-by: Christoph Hellwig > > Index: linux-2.6-xfs/fs/xfs/xfs_dinode.h > =================================================================== > --- linux-2.6-xfs.orig/fs/xfs/xfs_dinode.h 2007-08-23 18:52:49.000000000 +0200 > +++ linux-2.6-xfs/fs/xfs/xfs_dinode.h 2007-09-19 15:16:30.000000000 +0200 > @@ -171,69 +171,35 @@ typedef enum xfs_dinode_fmt > /* > * Inode data & attribute fork sizes, per inode. > */ > -#define XFS_CFORK_Q(dcp) ((dcp)->di_forkoff != 0) > -#define XFS_CFORK_Q_DISK(dcp) ((dcp)->di_forkoff != 0) > - > -#define XFS_CFORK_BOFF(dcp) ((int)((dcp)->di_forkoff << 3)) > -#define XFS_CFORK_BOFF_DISK(dcp) ((int)((dcp)->di_forkoff << 3)) > - > -#define XFS_CFORK_DSIZE_DISK(dcp,mp) \ > - (XFS_CFORK_Q_DISK(dcp) ? XFS_CFORK_BOFF_DISK(dcp) : XFS_LITINO(mp)) > -#define XFS_CFORK_DSIZE(dcp,mp) \ > - (XFS_CFORK_Q(dcp) ? XFS_CFORK_BOFF(dcp) : XFS_LITINO(mp)) > - > -#define XFS_CFORK_ASIZE_DISK(dcp,mp) \ > - (XFS_CFORK_Q_DISK(dcp) ? XFS_LITINO(mp) - XFS_CFORK_BOFF_DISK(dcp) : 0) > -#define XFS_CFORK_ASIZE(dcp,mp) \ > - (XFS_CFORK_Q(dcp) ? XFS_LITINO(mp) - XFS_CFORK_BOFF(dcp) : 0) > - > -#define XFS_CFORK_SIZE_DISK(dcp,mp,w) \ > - ((w) == XFS_DATA_FORK ? \ > - XFS_CFORK_DSIZE_DISK(dcp, mp) : \ > - XFS_CFORK_ASIZE_DISK(dcp, mp)) > -#define XFS_CFORK_SIZE(dcp,mp,w) \ > - ((w) == XFS_DATA_FORK ? \ > - XFS_CFORK_DSIZE(dcp, mp) : XFS_CFORK_ASIZE(dcp, mp)) > +#define XFS_DFORK_Q(dip) ((dip)->di_core.di_forkoff != 0) > +#define XFS_DFORK_BOFF(dip) ((int)((dip)->di_core.di_forkoff << 3)) > > #define XFS_DFORK_DSIZE(dip,mp) \ > - XFS_CFORK_DSIZE_DISK(&(dip)->di_core, mp) > -#define XFS_DFORK_DSIZE_HOST(dip,mp) \ > - XFS_CFORK_DSIZE(&(dip)->di_core, mp) > + (XFS_DFORK_Q(dip) ? \ > + XFS_DFORK_BOFF(dip) : \ > + XFS_LITINO(mp)) > #define XFS_DFORK_ASIZE(dip,mp) \ > - XFS_CFORK_ASIZE_DISK(&(dip)->di_core, mp) > -#define XFS_DFORK_ASIZE_HOST(dip,mp) \ > - XFS_CFORK_ASIZE(&(dip)->di_core, mp) > -#define XFS_DFORK_SIZE(dip,mp,w) \ > - XFS_CFORK_SIZE_DISK(&(dip)->di_core, mp, w) > -#define XFS_DFORK_SIZE_HOST(dip,mp,w) \ > - XFS_CFORK_SIZE(&(dip)->di_core, mp, w) > - > -#define XFS_DFORK_Q(dip) XFS_CFORK_Q_DISK(&(dip)->di_core) > -#define XFS_DFORK_BOFF(dip) XFS_CFORK_BOFF_DISK(&(dip)->di_core) > -#define XFS_DFORK_DPTR(dip) ((dip)->di_u.di_c) > -#define XFS_DFORK_APTR(dip) \ > - ((dip)->di_u.di_c + XFS_DFORK_BOFF(dip)) > -#define XFS_DFORK_PTR(dip,w) \ > - ((w) == XFS_DATA_FORK ? XFS_DFORK_DPTR(dip) : XFS_DFORK_APTR(dip)) > -#define XFS_CFORK_FORMAT(dcp,w) \ > - ((w) == XFS_DATA_FORK ? (dcp)->di_format : (dcp)->di_aformat) > -#define XFS_CFORK_FMT_SET(dcp,w,n) \ > + (XFS_DFORK_Q(dip) ? \ > + XFS_LITINO(mp) - XFS_DFORK_BOFF(dip) : \ > + 0) > +#define XFS_DFORK_SIZE(dip,mp,w) \ > ((w) == XFS_DATA_FORK ? \ > - ((dcp)->di_format = (n)) : ((dcp)->di_aformat = (n))) > -#define XFS_DFORK_FORMAT(dip,w) XFS_CFORK_FORMAT(&(dip)->di_core, w) > + XFS_DFORK_DSIZE(dip, mp) : \ > + XFS_DFORK_ASIZE(dip, mp)) > > -#define XFS_CFORK_NEXTENTS_DISK(dcp,w) \ > +#define XFS_DFORK_DPTR(dip) ((dip)->di_u.di_c) > +#define XFS_DFORK_APTR(dip) \ > + ((dip)->di_u.di_c + XFS_DFORK_BOFF(dip)) > +#define XFS_DFORK_PTR(dip,w) \ > + ((w) == XFS_DATA_FORK ? XFS_DFORK_DPTR(dip) : XFS_DFORK_APTR(dip)) > +#define XFS_DFORK_FORMAT(dip,w) \ > ((w) == XFS_DATA_FORK ? \ > - be32_to_cpu((dcp)->di_nextents) : \ > - be16_to_cpu((dcp)->di_anextents)) > -#define XFS_CFORK_NEXTENTS(dcp,w) \ > - ((w) == XFS_DATA_FORK ? (dcp)->di_nextents : (dcp)->di_anextents) > -#define XFS_DFORK_NEXTENTS(dip,w) XFS_CFORK_NEXTENTS_DISK(&(dip)->di_core, w) > -#define XFS_DFORK_NEXTENTS_HOST(dip,w) XFS_CFORK_NEXTENTS(&(dip)->di_core, w) > - > -#define XFS_CFORK_NEXT_SET(dcp,w,n) \ > + (dip)->di_core.di_format : \ > + (dip)->di_core.di_aformat) > +#define XFS_DFORK_NEXTENTS(dip,w) \ > ((w) == XFS_DATA_FORK ? \ > - ((dcp)->di_nextents = (n)) : ((dcp)->di_anextents = (n))) > + be32_to_cpu((dip)->di_core.di_nextents) : \ > + be16_to_cpu((dip)->di_core.di_anextents)) > > #define XFS_BUF_TO_DINODE(bp) ((xfs_dinode_t *)XFS_BUF_PTR(bp)) > > Index: linux-2.6-xfs/fs/xfs/xfs_inode.h > =================================================================== > --- linux-2.6-xfs.orig/fs/xfs/xfs_inode.h 2007-09-19 15:09:31.000000000 +0200 > +++ linux-2.6-xfs/fs/xfs/xfs_inode.h 2007-09-19 15:16:30.000000000 +0200 > @@ -341,17 +341,42 @@ xfs_iflags_test_and_clear(xfs_inode_t *i > /* > * Fork handling. > */ > -#define XFS_IFORK_PTR(ip,w) \ > - ((w) == XFS_DATA_FORK ? &(ip)->i_df : (ip)->i_afp) > -#define XFS_IFORK_Q(ip) XFS_CFORK_Q(&(ip)->i_d) > -#define XFS_IFORK_DSIZE(ip) XFS_CFORK_DSIZE(&ip->i_d, ip->i_mount) > -#define XFS_IFORK_ASIZE(ip) XFS_CFORK_ASIZE(&ip->i_d, ip->i_mount) > -#define XFS_IFORK_SIZE(ip,w) XFS_CFORK_SIZE(&ip->i_d, ip->i_mount, w) > -#define XFS_IFORK_FORMAT(ip,w) XFS_CFORK_FORMAT(&ip->i_d, w) > -#define XFS_IFORK_FMT_SET(ip,w,n) XFS_CFORK_FMT_SET(&ip->i_d, w, n) > -#define XFS_IFORK_NEXTENTS(ip,w) XFS_CFORK_NEXTENTS(&ip->i_d, w) > -#define XFS_IFORK_NEXT_SET(ip,w,n) XFS_CFORK_NEXT_SET(&ip->i_d, w, n) > > +#define XFS_IFORK_Q(ip) ((ip)->i_d.di_forkoff != 0) > +#define XFS_IFORK_BOFF(ip) ((int)((ip)->i_d.di_forkoff << 3)) > + > +#define XFS_IFORK_PTR(ip,w) \ > + ((w) == XFS_DATA_FORK ? \ > + &(ip)->i_df : \ > + (ip)->i_afp) > +#define XFS_IFORK_DSIZE(ip) \ > + (XFS_IFORK_Q(ip) ? \ > + XFS_IFORK_BOFF(ip) : \ > + XFS_LITINO((ip)->i_mount)) > +#define XFS_IFORK_ASIZE(ip) \ > + (XFS_IFORK_Q(ip) ? \ > + XFS_LITINO((ip)->i_mount) - XFS_IFORK_BOFF(ip) : \ > + 0) > +#define XFS_IFORK_SIZE(ip,w) \ > + ((w) == XFS_DATA_FORK ? \ > + XFS_IFORK_DSIZE(ip) : \ > + XFS_IFORK_ASIZE(ip)) > +#define XFS_IFORK_FORMAT(ip,w) \ > + ((w) == XFS_DATA_FORK ? \ > + (ip)->i_d.di_format : \ > + (ip)->i_d.di_aformat) > +#define XFS_IFORK_FMT_SET(ip,w,n) \ > + ((w) == XFS_DATA_FORK ? \ > + ((ip)->i_d.di_format = (n)) : \ > + ((ip)->i_d.di_aformat = (n))) > +#define XFS_IFORK_NEXTENTS(ip,w) \ > + ((w) == XFS_DATA_FORK ? \ > + (ip)->i_d.di_nextents : \ > + (ip)->i_d.di_anextents) > +#define XFS_IFORK_NEXT_SET(ip,w,n) \ > + ((w) == XFS_DATA_FORK ? \ > + ((ip)->i_d.di_nextents = (n)) : \ > + ((ip)->i_d.di_anextents = (n))) > > #ifdef __KERNEL__ > > @@ -504,7 +529,7 @@ void xfs_dinode_to_disk(struct xfs_dino > struct xfs_icdinode *); > > uint xfs_ip2xflags(struct xfs_inode *); > -uint xfs_dic2xflags(struct xfs_dinode_core *); > +uint xfs_dic2xflags(struct xfs_dinode *); > int xfs_ifree(struct xfs_trans *, xfs_inode_t *, > struct xfs_bmap_free *); > int xfs_itruncate_start(xfs_inode_t *, uint, xfs_fsize_t); > Index: linux-2.6-xfs/fs/xfs/xfs_inode.c > =================================================================== > --- linux-2.6-xfs.orig/fs/xfs/xfs_inode.c 2007-09-19 15:09:31.000000000 +0200 > +++ linux-2.6-xfs/fs/xfs/xfs_inode.c 2007-09-19 15:16:30.000000000 +0200 > @@ -826,15 +826,17 @@ xfs_ip2xflags( > xfs_icdinode_t *dic = &ip->i_d; > > return _xfs_dic2xflags(dic->di_flags) | > - (XFS_CFORK_Q(dic) ? XFS_XFLAG_HASATTR : 0); > + (XFS_IFORK_Q(ip) ? XFS_XFLAG_HASATTR : 0); > } > > uint > xfs_dic2xflags( > - xfs_dinode_core_t *dic) > + xfs_dinode_t *dip) > { > + xfs_dinode_core_t *dic = &dip->di_core; > + > return _xfs_dic2xflags(be16_to_cpu(dic->di_flags)) | > - (XFS_CFORK_Q_DISK(dic) ? XFS_XFLAG_HASATTR : 0); > + (XFS_DFORK_Q(dip) ? XFS_XFLAG_HASATTR : 0); > } > > /* > Index: linux-2.6-xfs/fs/xfs/xfs_itable.c > =================================================================== > --- linux-2.6-xfs.orig/fs/xfs/xfs_itable.c 2007-09-12 13:56:17.000000000 +0200 > +++ linux-2.6-xfs/fs/xfs/xfs_itable.c 2007-09-19 15:16:30.000000000 +0200 > @@ -170,7 +170,7 @@ xfs_bulkstat_one_dinode( > buf->bs_mtime.tv_nsec = be32_to_cpu(dic->di_mtime.t_nsec); > buf->bs_ctime.tv_sec = be32_to_cpu(dic->di_ctime.t_sec); > buf->bs_ctime.tv_nsec = be32_to_cpu(dic->di_ctime.t_nsec); > - buf->bs_xflags = xfs_dic2xflags(dic); > + buf->bs_xflags = xfs_dic2xflags(dip); > buf->bs_extsize = be32_to_cpu(dic->di_extsize) << mp->m_sb.sb_blocklog; > buf->bs_extents = be32_to_cpu(dic->di_nextents); > buf->bs_gen = be32_to_cpu(dic->di_gen); > @@ -299,7 +299,7 @@ xfs_bulkstat_use_dinode( > } > /* BULKSTAT_FG_INLINE: if attr fork is local, or not there, use it */ > aformat = dip->di_core.di_aformat; > - if ((XFS_CFORK_Q(&dip->di_core) == 0) || > + if ((XFS_DFORK_Q(dip) == 0) || > (aformat == XFS_DINODE_FMT_LOCAL) || > (aformat == XFS_DINODE_FMT_EXTENTS && !dip->di_core.di_anextents)) { > *dipp = dip; > Index: linux-2.6-xfs/fs/xfs/dmapi/xfs_dm.c > =================================================================== > --- linux-2.6-xfs.orig/fs/xfs/dmapi/xfs_dm.c 2007-09-19 18:50:35.000000000 +0200 > +++ linux-2.6-xfs/fs/xfs/dmapi/xfs_dm.c 2007-09-19 18:51:01.000000000 +0200 > @@ -355,7 +355,7 @@ xfs_ip2dmflags( > xfs_inode_t *ip) > { > return _xfs_dic2dmflags(ip->i_d.di_flags) | > - (XFS_CFORK_Q(&ip->i_d) ? DM_XFLAG_HASATTR : 0); > + (XFS_IFORK_Q(ip) ? DM_XFLAG_HASATTR : 0); > } > > STATIC uint > @@ -363,8 +363,7 @@ xfs_dic2dmflags( > xfs_dinode_t *dip) > { > return _xfs_dic2dmflags(be16_to_cpu(dip->di_core.di_flags)) | > - (XFS_CFORK_Q_DISK(&dip->di_core) ? > - DM_XFLAG_HASATTR : 0); > + (XFS_DFORK_Q(dip) ? DM_XFLAG_HASATTR : 0); > } > > /* > > ---end quoted text--- From owner-xfs@oss.sgi.com Sun Nov 25 11:04:53 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 25 Nov 2007 11:04:55 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: **** X-Spam-Status: No, score=4.2 required=5.0 tests=BAYES_99,J_CHICKENPOX_37, J_CHICKENPOX_39 autolearn=no version=3.3.0-r574664 Received: from smtp-out4.libero.it (smtp-out4.libero.it [212.52.84.46]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAPJ4nPI019223 for ; Sun, 25 Nov 2007 11:04:53 -0800 Received: from MailRelay10.libero.it (192.168.32.119) by smtp-out4.libero.it (7.3.120) id 4688F3500FAC7E7E for xfs@oss.sgi.com; Sun, 25 Nov 2007 19:40:02 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAJBQSUdVMejA/2dsb2JhbAAIkEc Received: from unknown (HELO libero.it) ([192.168.17.4]) by outrelay10.libero.it with ESMTP; 25 Nov 2007 19:39:55 +0100 Date: Sun, 25 Nov 2007 19:39:55 +0100 Message-Id: Subject: winning MIME-Version: 1.0 X-Sensitivity: 3 Content-Type: text/plain; charset=iso-8859-1 From: "onlinneedd" X-XaM3-API-Version: 4.3 (R1) (B3pl19) X-SenderIP: 85.49.232.192 To: undisclosed-recipients:; X-Virus-Scanned: ClamAV 0.91.2/4911/Sun Nov 25 09:58:35 2007 on oss.sgi.com X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id lAPJ4rPI019232 X-archive-position: 13776 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: onlinneedd@libero.it Precedence: bulk X-list: xfs You have won 550,000.00 Euro in DE EURO MILLIONES ONLINE INT.LOTTERY SPAIN.For further development for Clarification and procedure please Contact , Mr Javier Lopez E-mail: milloooffice@aim.com TEL: +34 696 756 270 (1)Tic Nr: 6460DGH (2) Sr Nr: 0909AOB09 (3) LU Nr: 726726XZJHN (4)BTNr: 2GH267XZZ1-5-42 (5) RF Nr 9527BCV-33-7-7-7 Regards. Mss.SodicLasy From owner-xfs@oss.sgi.com Sun Nov 25 11:29:24 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 25 Nov 2007 11:29:27 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: **** X-Spam-Status: No, score=4.2 required=5.0 tests=BAYES_99,J_CHICKENPOX_37, J_CHICKENPOX_39 autolearn=no version=3.3.0-r574664 Received: from smtp-out2.libero.it (smtp-out2.libero.it [212.52.84.42]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAPJTJmg023378 for ; Sun, 25 Nov 2007 11:29:22 -0800 Received: from MailRelay10.libero.it (192.168.32.119) by smtp-out2.libero.it (7.3.120) id 4688F31B0FC674BA for xfs@oss.sgi.com; Sun, 25 Nov 2007 20:29:27 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAFNbSUdVMejA/2dsb2JhbAAIkEc Received: from unknown (HELO libero.it) ([192.168.17.9]) by outrelay10.libero.it with ESMTP; 25 Nov 2007 20:29:27 +0100 Date: Sun, 25 Nov 2007 20:29:27 +0100 Message-Id: Subject: WINNING MIME-Version: 1.0 X-Sensitivity: 3 Content-Type: text/plain; charset=iso-8859-1 From: "onlinneedd" X-XaM3-API-Version: 4.3 (R1) (B3pl19) X-SenderIP: 85.49.232.192 To: undisclosed-recipients:; X-Virus-Scanned: ClamAV 0.91.2/4911/Sun Nov 25 09:58:35 2007 on oss.sgi.com X-Virus-Status: Clean Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id lAPJTOmg023388 X-archive-position: 13777 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: onlinneedd@libero.it Precedence: bulk X-list: xfs You have won 550,000.00 Euro in DE EURO MILLIONES ONLINE INT.LOTTERY SPAIN.For further development for Clarification and procedure please Contact , Mr Javier Lopez E-mail: milloooffice@aim.com TEL: +34 696 756 270 (1)Tic Nr: 6460DGH (2) Sr Nr: 0909AOB09 (3) LU Nr: 726726XZJHN (4)BTNr: 2GH267XZZ1-5-42 (5) RF Nr 9527BCV-33-7-7-7 Regards. Mss.SodicLasy From owner-xfs@oss.sgi.com Sun Nov 25 14:59:28 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 25 Nov 2007 15:00:38 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAPMxP1e031725 for ; Sun, 25 Nov 2007 14:59:27 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA29228; Mon, 26 Nov 2007 09:59:30 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAPMxTdD120653485; Mon, 26 Nov 2007 09:59:29 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAPMxSjQ119334534; Mon, 26 Nov 2007 09:59:28 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Mon, 26 Nov 2007 09:59:28 +1100 From: David Chinner To: Lachlan McIlroy Cc: xfs-dev , xfs-oss Subject: Re: [PATCH, RFC] Delayed logging of file sizes Message-ID: <20071125225928.GE114266761@sgi.com> References: <47467B87.2000000@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <47467B87.2000000@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4911/Sun Nov 25 09:58:35 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13778 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Fri, Nov 23, 2007 at 06:04:39PM +1100, Lachlan McIlroy wrote: > The easy solution is to log everything so that log replay doesn't need > to check if the on-disk version is newer - it can just replay the log. > But logging everything would cause too much log traffic so this patch > is a compromise and it logs a transaction before we flush an inode to > disk only if it has changes that have not yet been logged. The problem with this is that the inode will be marked dirty during the transaction, so we'll never be able to clean an inode if we issue a transaction during inode writeback. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Sun Nov 25 15:17:34 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 25 Nov 2007 15:17:39 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=BAYES_50 autolearn=ham version=3.3.0-r574664 Received: from smtp110.mail.mud.yahoo.com (smtp110.mail.mud.yahoo.com [209.191.85.220]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAPNHWRI002808 for ; Sun, 25 Nov 2007 15:17:33 -0800 Received: (qmail 10814 invoked from network); 25 Nov 2007 23:17:41 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com.au; h=Received:From:To:Subject:Date:User-Agent:Cc:References:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding:Content-Disposition:Message-Id; b=FH/gUxhg7hKkXZ7TBjWjHnVJ2M482+c4guAadN5ae4YZx06VSq/QILRpNqEOc8FHWWRhl0Nsfq6LsQB2VBT0cdSzLFqC+OKtMjkZgQYVHg2zHwOwBJf4iXxF4Fu+CKjHZy52vWlrMOofKezViQIpdCkUrwCSlbD/2ZlYY4TcHoY= ; Received: from unknown (HELO ?192.168.1.2?) (nickpiggin@59.167.43.239 with login) by smtp110.mail.mud.yahoo.com with SMTP; 25 Nov 2007 23:17:39 -0000 From: Nick Piggin To: David Chinner Subject: Re: [PATCH 1/9]: introduce radix_tree_gang_lookup_range Date: Mon, 26 Nov 2007 10:17:24 +1100 User-Agent: KMail/1.9.5 Cc: xfs-oss , lkml References: <20071122003211.GG114266761@sgi.com> In-Reply-To: <20071122003211.GG114266761@sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200711261017.24694.nickpiggin@yahoo.com.au> X-Virus-Scanned: ClamAV 0.91.2/4911/Sun Nov 25 09:58:35 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13779 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: nickpiggin@yahoo.com.au Precedence: bulk X-list: xfs On Thursday 22 November 2007 11:32, David Chinner wrote: > Introduce radix_tree_gang_lookup_range() > > The inode clustering in XFS requires a gang lookup on the radix tree to > find all the inodes in the cluster. The gang lookup has to set the > maximum items to that of a fully populated cluster so we get all the > inodes in the cluster, but we only populate the radix tree sparsely (on > demand). > > As a result, the gang lookup can search way, way past the index of end > of the cluster because it is looking for a fixed number of entries to > return. > > We know we want to terminate the search at either a specific index or a > maximum number of items, so we need to add a "last_index" parameter to > the lookup. Yeah, this fixes one downside of the gang lookup API. For consistency it would be nice to do this for the tag lookup API as well... > Furthermore, the existing radix_tree_gang_lookup() can use this same > function if we define a RADIX_TREE_MAX_INDEX value so the search is not > limited by the last_index. Nit: should just define it to be ULONG_MAX. > > Signed-off-by: Dave Chinner Otherwise, Acked-by: Nick Piggin From owner-xfs@oss.sgi.com Sun Nov 25 15:29:55 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 25 Nov 2007 15:30:00 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAPNTqAg004879 for ; Sun, 25 Nov 2007 15:29:54 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA29945; Mon, 26 Nov 2007 10:29:56 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAPNTsdD119695766; Mon, 26 Nov 2007 10:29:55 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAPNTrFZ120645262; Mon, 26 Nov 2007 10:29:53 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Mon, 26 Nov 2007 10:29:53 +1100 From: David Chinner To: Nick Piggin Cc: David Chinner , xfs-oss , lkml Subject: Re: [PATCH 1/9]: introduce radix_tree_gang_lookup_range Message-ID: <20071125232953.GF114266761@sgi.com> References: <20071122003211.GG114266761@sgi.com> <200711261017.24694.nickpiggin@yahoo.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200711261017.24694.nickpiggin@yahoo.com.au> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4911/Sun Nov 25 09:58:35 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13780 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Mon, Nov 26, 2007 at 10:17:24AM +1100, Nick Piggin wrote: > On Thursday 22 November 2007 11:32, David Chinner wrote: > > Introduce radix_tree_gang_lookup_range() > > > > The inode clustering in XFS requires a gang lookup on the radix tree to > > find all the inodes in the cluster. The gang lookup has to set the > > maximum items to that of a fully populated cluster so we get all the > > inodes in the cluster, but we only populate the radix tree sparsely (on > > demand). > > > > As a result, the gang lookup can search way, way past the index of end > > of the cluster because it is looking for a fixed number of entries to > > return. > > > > We know we want to terminate the search at either a specific index or a > > maximum number of items, so we need to add a "last_index" parameter to > > the lookup. > > Yeah, this fixes one downside of the gang lookup API. For consistency > it would be nice to do this for the tag lookup API as well... Sure, I have need to do that as well. ;) > > Furthermore, the existing radix_tree_gang_lookup() can use this same > > function if we define a RADIX_TREE_MAX_INDEX value so the search is not > > limited by the last_index. > > Nit: should just define it to be ULONG_MAX. Oh, right. Silly me. I'll post updated radix tree patches later today. Thanks, Nick. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Sun Nov 25 16:21:01 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 25 Nov 2007 16:21:04 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAQ0Kuro017169 for ; Sun, 25 Nov 2007 16:20:59 -0800 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA01171; Mon, 26 Nov 2007 11:21:00 +1100 Message-ID: <474A112D.2040006@sgi.com> Date: Mon, 26 Nov 2007 11:19:57 +1100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.9 (X11/20071031) MIME-Version: 1.0 To: David Chinner CC: xfs-dev , xfs-oss Subject: Re: [PATCH, RFC] Delayed logging of file sizes References: <47467B87.2000000@sgi.com> <20071125225928.GE114266761@sgi.com> In-Reply-To: <20071125225928.GE114266761@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4911/Sun Nov 25 09:58:35 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13781 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs David Chinner wrote: > On Fri, Nov 23, 2007 at 06:04:39PM +1100, Lachlan McIlroy wrote: >> The easy solution is to log everything so that log replay doesn't need >> to check if the on-disk version is newer - it can just replay the log. >> But logging everything would cause too much log traffic so this patch >> is a compromise and it logs a transaction before we flush an inode to >> disk only if it has changes that have not yet been logged. > > The problem with this is that the inode will be marked dirty during the > transaction, so we'll never be able to clean an inode if we issue a > transaction during inode writeback. > Ah, yeah, good point. I wrote this patch back before that "dirty inode on transaction" patch went in. For this transaction though the changes to the inode have already been made (ie when we set i_update_core and called mark_inode_dirty_sync()) so there is no need to dirty it in this transaction. I'll keep digging. Thanks. From owner-xfs@oss.sgi.com Sun Nov 25 16:26:52 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 25 Nov 2007 16:26:56 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAQ0QnRd018748 for ; Sun, 25 Nov 2007 16:26:51 -0800 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA01254; Mon, 26 Nov 2007 11:26:48 +1100 Message-ID: <474A1289.2080500@sgi.com> Date: Mon, 26 Nov 2007 11:25:45 +1100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.9 (X11/20071031) MIME-Version: 1.0 To: Christoph Hellwig CC: xfs-dev , xfs-oss Subject: Re: [PATCH] Fix up xfs_buf_associate_memory() References: <47465712.1050000@sgi.com> <20071123134302.GA4256@infradead.org> In-Reply-To: <20071123134302.GA4256@infradead.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4911/Sun Nov 25 09:58:35 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13782 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs Christoph Hellwig wrote: > On Fri, Nov 23, 2007 at 03:29:06PM +1100, Lachlan McIlroy wrote: >> Fixed a few bugs in xfs_buf_associate_memory() including: >> >> - calculation of 'page_count' was incorrect as it did not >> consider the offset of 'mem' into the first page. The >> logic to bump 'page_count' didn't work if 'len' was <= >> PAGE_CACHE_SIZE (ie offset = 3k, len = 2k). >> - setting b_buffer_length to 'len' is incorrect if >> 'offset' is > 0. Set it to the total length of the >> buffer. >> - I suspect that passing a non-aligned address into >> mem_to_page() for the first page may have been causing >> issues - don't know but just tidy up that code anyway. >> >> These fixes prevent an data corruption issue that can >> occur during log replay. > > Last time I tried to clean up this gem everything went bezerk, so be > aware :) > > > --- fs/xfs/linux-2.6/xfs_buf.c_1.247 2007-11-23 12:03:16.000000000 +1100 > +++ fs/xfs/linux-2.6/xfs_buf.c 2007-11-23 12:02:32.000000000 +1100 > @@ -726,14 +726,14 @@ xfs_buf_associate_memory( > int rval; > int i = 0; > size_t ptr; > + size_t buflen; > off_t offset; > int page_count; > > + ptr = (size_t) mem & PAGE_CACHE_MASK; > + offset = (off_t) mem - (off_t) ptr; > > Casting pointers to size_t or off_t makes little sense, these should be > unsigned long. And using a variable name of ptr is quite odd :) I just left those as they were before the change. I'll tidy them too. > > + while (i < bp->b_page_count) { > + bp->b_pages[i++] = mem_to_page((void *)ptr); > ptr += PAGE_CACHE_SIZE; > } > > This could be much cleaner written as: > > for (i = 0; i < bp->b_page_count; i++) { > bp->b_pages[i] = mem_to_page((void *)ptr); > ptr += PAGE_CACHE_SIZE; > } > Fine with me. Thanks. From owner-xfs@oss.sgi.com Sun Nov 25 17:10:42 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 25 Nov 2007 17:10:46 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAQ1Ab8I025111 for ; Sun, 25 Nov 2007 17:10:41 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA02239; Mon, 26 Nov 2007 12:10:45 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAQ1AidD120325012; Mon, 26 Nov 2007 12:10:45 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAQ1Airt120783100; Mon, 26 Nov 2007 12:10:44 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Mon, 26 Nov 2007 12:10:44 +1100 From: David Chinner To: Lachlan McIlroy Cc: David Chinner , xfs-dev , xfs-oss Subject: Re: [PATCH, RFC] Delayed logging of file sizes Message-ID: <20071126011044.GG114266761@sgi.com> References: <47467B87.2000000@sgi.com> <20071125225928.GE114266761@sgi.com> <474A112D.2040006@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <474A112D.2040006@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4911/Sun Nov 25 09:58:35 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13783 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Mon, Nov 26, 2007 at 11:19:57AM +1100, Lachlan McIlroy wrote: > David Chinner wrote: > >On Fri, Nov 23, 2007 at 06:04:39PM +1100, Lachlan McIlroy wrote: > >>The easy solution is to log everything so that log replay doesn't need > >>to check if the on-disk version is newer - it can just replay the log. > >>But logging everything would cause too much log traffic so this patch > >>is a compromise and it logs a transaction before we flush an inode to > >>disk only if it has changes that have not yet been logged. > > > >The problem with this is that the inode will be marked dirty during the > >transaction, so we'll never be able to clean an inode if we issue a > >transaction during inode writeback. > > Ah, yeah, good point. I wrote this patch back before that "dirty inode > on transaction" patch went in. Wouldn't have made aany difference - the inode woul dbe marked dirty at transaction completion... > For this transaction though the changes > to the inode have already been made (ie when we set i_update_core and > called mark_inode_dirty_sync()) so there is no need to dirty it in this > transaction. I'll keep digging. Thanks. I wouldn't worry too much about this problem right now - I'm working on moving the dirty state into the inode radix trees so i_update_core might even go away completely soon.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Sun Nov 25 17:17:54 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 25 Nov 2007 17:17:58 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAQ1Hoxk026741 for ; Sun, 25 Nov 2007 17:17:53 -0800 Received: from timothy-shimmins-power-mac-g5.local (boing.melbourne.sgi.com [134.14.55.141]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA02566; Mon, 26 Nov 2007 12:17:50 +1100 Message-ID: <474A1F24.9070303@sgi.com> Date: Mon, 26 Nov 2007 12:19:32 +1100 From: Timothy Shimmin User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Christoph Hellwig CC: Christoph Hellwig , xfs@oss.sgi.com Subject: Re: [PATCH] cleanup XFS_IFORK_*/XFS_DFORK* macros References: <20070922102238.GA15732@lst.de> <20071125163102.GB17922@infradead.org> In-Reply-To: <20071125163102.GB17922@infradead.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4911/Sun Nov 25 09:58:35 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13784 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: tes@sgi.com Precedence: bulk X-list: xfs Christoph Hellwig wrote: > On Sat, Sep 22, 2007 at 12:22:38PM +0200, Christoph Hellwig wrote: >> (try number three, maybe it manages to get through the list this time) >> >> Currently XFS_IFORK_* and XFS_DFORK* are implemented by means of >> XFS_CFORK* macros. But given that XFS_IFORK_* operates on an >> xfs_inode that embedds and xfs_icdinode_core and XFS_DFORK_* operates >> on an xfs_dinode that embedds a xfs_dinode_core one will have to do >> endian swapping while the other doesn't. Instead of having the current >> mess with the CFORK macros that have byteswapping and non-byteswapping >> version (which are inconsistantly named while we're at it) just define >> each family of the macros to stand by itself and simplify the whole >> matter. >> >> A few direct references to the CFORK variants were cleaned up to >> use IFORK or DFORK to make this possible. > > ping? this is almost two month old now.. > I'll have a look... --Tim From owner-xfs@oss.sgi.com Sun Nov 25 17:30:37 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 25 Nov 2007 17:30:40 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAQ1UZnr028946 for ; Sun, 25 Nov 2007 17:30:36 -0800 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA02965; Mon, 26 Nov 2007 12:30:39 +1100 Message-ID: <474A2180.7000605@sgi.com> Date: Mon, 26 Nov 2007 12:29:36 +1100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.9 (X11/20071031) MIME-Version: 1.0 To: David Chinner CC: xfs-dev , xfs-oss Subject: Re: [PATCH, RFC] Delayed logging of file sizes References: <47467B87.2000000@sgi.com> <20071125225928.GE114266761@sgi.com> <474A112D.2040006@sgi.com> <20071126011044.GG114266761@sgi.com> In-Reply-To: <20071126011044.GG114266761@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4911/Sun Nov 25 09:58:35 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13785 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs David Chinner wrote: > On Mon, Nov 26, 2007 at 11:19:57AM +1100, Lachlan McIlroy wrote: >> David Chinner wrote: >>> On Fri, Nov 23, 2007 at 06:04:39PM +1100, Lachlan McIlroy wrote: >>>> The easy solution is to log everything so that log replay doesn't need >>>> to check if the on-disk version is newer - it can just replay the log. >>>> But logging everything would cause too much log traffic so this patch >>>> is a compromise and it logs a transaction before we flush an inode to >>>> disk only if it has changes that have not yet been logged. >>> The problem with this is that the inode will be marked dirty during the >>> transaction, so we'll never be able to clean an inode if we issue a >>> transaction during inode writeback. >> Ah, yeah, good point. I wrote this patch back before that "dirty inode >> on transaction" patch went in. > > Wouldn't have made aany difference - the inode woul dbe marked dirty > at transaction completion... > >> For this transaction though the changes >> to the inode have already been made (ie when we set i_update_core and >> called mark_inode_dirty_sync()) so there is no need to dirty it in this >> transaction. I'll keep digging. Thanks. > > I wouldn't worry too much about this problem right now - I'm working > on moving the dirty state into the inode radix trees so i_update_core > might even go away completely soon.... > Which problem? Just the bit about dirtying the inode or will your changes allow us to log all inode changes? What's the motivation for moving the dirty state? From owner-xfs@oss.sgi.com Sun Nov 25 18:12:57 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 25 Nov 2007 18:13:00 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAQ2CsjD001439 for ; Sun, 25 Nov 2007 18:12:56 -0800 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA03805; Mon, 26 Nov 2007 13:12:49 +1100 Message-ID: <474A2B62.20204@sgi.com> Date: Mon, 26 Nov 2007 13:11:46 +1100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.9 (X11/20071031) MIME-Version: 1.0 To: David Chinner CC: xfs-oss , lkml Subject: Re: [PATCH 2/9]: Reduce Log I/O latency References: <20071122003339.GH114266761@sgi.com> In-Reply-To: <20071122003339.GH114266761@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4911/Sun Nov 25 09:58:35 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13786 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs > Index: 2.6.x-xfs-new/fs/xfs/xfs_log.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_log.c 2007-11-22 10:47:21.945395328 +1100 > +++ 2.6.x-xfs-new/fs/xfs/xfs_log.c 2007-11-22 10:53:11.556186722 +1100 > @@ -1443,6 +1443,8 @@ xlog_sync(xlog_t *log, > XFS_BUF_ZEROFLAGS(bp); > XFS_BUF_BUSY(bp); > XFS_BUF_ASYNC(bp); > + XFS_BUF_SET_LOGBUF(bp); > + > /* > * Do an ordered write for the log block. > * Its unnecessary to flush the first split block in the log wrap case. Whichever way you go with this one Dave you should probably add another XFS_BUF_SET_LOGBUF() call for the buffer split case further down in the same function. From owner-xfs@oss.sgi.com Sun Nov 25 18:15:15 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 25 Nov 2007 18:15:20 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAQ2FAYN002308 for ; Sun, 25 Nov 2007 18:15:14 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA03878; Mon, 26 Nov 2007 13:15:17 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAQ2FGdD120859592; Mon, 26 Nov 2007 13:15:17 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAQ2FFHR120615083; Mon, 26 Nov 2007 13:15:15 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Mon, 26 Nov 2007 13:15:15 +1100 From: David Chinner To: Lachlan McIlroy Cc: David Chinner , xfs-dev , xfs-oss Subject: Re: [PATCH, RFC] Delayed logging of file sizes Message-ID: <20071126021515.GH114266761@sgi.com> References: <47467B87.2000000@sgi.com> <20071125225928.GE114266761@sgi.com> <474A112D.2040006@sgi.com> <20071126011044.GG114266761@sgi.com> <474A2180.7000605@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <474A2180.7000605@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4911/Sun Nov 25 09:58:35 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13787 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Mon, Nov 26, 2007 at 12:29:36PM +1100, Lachlan McIlroy wrote: > David Chinner wrote: > >On Mon, Nov 26, 2007 at 11:19:57AM +1100, Lachlan McIlroy wrote: > >>David Chinner wrote: > >>>On Fri, Nov 23, 2007 at 06:04:39PM +1100, Lachlan McIlroy wrote: > >>>>The easy solution is to log everything so that log replay doesn't need > >>>>to check if the on-disk version is newer - it can just replay the log. > >>>>But logging everything would cause too much log traffic so this patch > >>>>is a compromise and it logs a transaction before we flush an inode to > >>>>disk only if it has changes that have not yet been logged. > >>>The problem with this is that the inode will be marked dirty during the > >>>transaction, so we'll never be able to clean an inode if we issue a > >>>transaction during inode writeback. > >>Ah, yeah, good point. I wrote this patch back before that "dirty inode > >>on transaction" patch went in. > > > >Wouldn't have made aany difference - the inode woul dbe marked dirty > >at transaction completion... > > > >>For this transaction though the changes > >>to the inode have already been made (ie when we set i_update_core and > >>called mark_inode_dirty_sync()) so there is no need to dirty it in this > >>transaction. I'll keep digging. Thanks. > > > >I wouldn't worry too much about this problem right now - I'm working > >on moving the dirty state into the inode radix trees so i_update_core > >might even go away completely soon.... > > > > Which problem? Just the bit about dirtying the inode or will your changes > allow us to log all inode changes? Trying to change XFS to logging all updates. > What's the motivation for moving the dirty state? Better inode writeback clustering. i.e. it's easy to find all the dirty inodes and then we can write them in larger contiguous chunks. The first "hack" at this I did tracked only inodes in the AIL. Sequential create of small files improved by about 20% with better clustering during tail pushing operations. I'm trying to make it track all dirty inodes at this point (via ->dirty_inode). This may mean that i_update_core is not needed to track whether an inode needs writeback or not. Not to mention all that horrible IPOINTER crap can get removed from xfs_sync_inodes() because finding dirty inodes is now a lockless radix tree traverse based on a dirty tag lookup. That also means the global mount inodes list can be replaced by a lockless radix tree traverse, so we can lose another 2 pointers in the xfs_inode_t and lock operations out of the inode get and reclaim paths. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Sun Nov 25 19:24:17 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 25 Nov 2007 19:24:21 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAQ3OEw6021859 for ; Sun, 25 Nov 2007 19:24:16 -0800 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA05283; Mon, 26 Nov 2007 14:17:37 +1100 Message-ID: <474A3A92.2040200@sgi.com> Date: Mon, 26 Nov 2007 14:16:34 +1100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.9 (X11/20071031) MIME-Version: 1.0 To: David Chinner CC: xfs-dev , xfs-oss Subject: Re: [PATCH, RFC] Delayed logging of file sizes References: <47467B87.2000000@sgi.com> <20071125225928.GE114266761@sgi.com> <474A112D.2040006@sgi.com> <20071126011044.GG114266761@sgi.com> <474A2180.7000605@sgi.com> <20071126021515.GH114266761@sgi.com> In-Reply-To: <20071126021515.GH114266761@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4911/Sun Nov 25 09:58:35 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13788 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs David Chinner wrote: > On Mon, Nov 26, 2007 at 12:29:36PM +1100, Lachlan McIlroy wrote: >> David Chinner wrote: >>> On Mon, Nov 26, 2007 at 11:19:57AM +1100, Lachlan McIlroy wrote: >>>> David Chinner wrote: >>>>> On Fri, Nov 23, 2007 at 06:04:39PM +1100, Lachlan McIlroy wrote: >>>>>> The easy solution is to log everything so that log replay doesn't need >>>>>> to check if the on-disk version is newer - it can just replay the log. >>>>>> But logging everything would cause too much log traffic so this patch >>>>>> is a compromise and it logs a transaction before we flush an inode to >>>>>> disk only if it has changes that have not yet been logged. >>>>> The problem with this is that the inode will be marked dirty during the >>>>> transaction, so we'll never be able to clean an inode if we issue a >>>>> transaction during inode writeback. >>>> Ah, yeah, good point. I wrote this patch back before that "dirty inode >>>> on transaction" patch went in. >>> Wouldn't have made aany difference - the inode woul dbe marked dirty >>> at transaction completion... >>> >>>> For this transaction though the changes >>>> to the inode have already been made (ie when we set i_update_core and >>>> called mark_inode_dirty_sync()) so there is no need to dirty it in this >>>> transaction. I'll keep digging. Thanks. >>> I wouldn't worry too much about this problem right now - I'm working >>> on moving the dirty state into the inode radix trees so i_update_core >>> might even go away completely soon.... >>> >> Which problem? Just the bit about dirtying the inode or will your changes >> allow us to log all inode changes? > > Trying to change XFS to logging all updates. That would be great. But what about the increase in log traffic that has deterred us from doing this in the past? > >> What's the motivation for moving the dirty state? > > Better inode writeback clustering. i.e. it's easy to find all the dirty > inodes and then we can write them in larger contiguous chunks. The first > "hack" at this I did tracked only inodes in the AIL. Sequential create > of small files improved by about 20% with better clustering during > tail pushing operations. I'm trying to make it track all dirty inodes > at this point (via ->dirty_inode). This may mean that i_update_core > is not needed to track whether an inode needs writeback or not. Okay, I'm interested to see what you come up with. > > Not to mention all that horrible IPOINTER crap can get removed from > xfs_sync_inodes() because finding dirty inodes is now a lockless radix > tree traverse based on a dirty tag lookup. Oh good, that macro hackery is ugly. > > That also means the global mount inodes list can be replaced by a lockless radix > tree traverse, so we can lose another 2 pointers in the xfs_inode_t and lock > operations out of the inode get and reclaim paths. > From owner-xfs@oss.sgi.com Sun Nov 25 20:21:57 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 25 Nov 2007 20:22:00 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAQ4Lrvd003494 for ; Sun, 25 Nov 2007 20:21:54 -0800 Received: from timothy-shimmins-power-mac-g5.local (boing.melbourne.sgi.com [134.14.55.141]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA06366; Mon, 26 Nov 2007 15:21:51 +1100 Message-ID: <474A4A45.9090600@sgi.com> Date: Mon, 26 Nov 2007 15:23:33 +1100 From: Timothy Shimmin User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Christoph Hellwig CC: Christoph Hellwig , xfs@oss.sgi.com Subject: Re: [PATCH] cleanup XFS_IFORK_*/XFS_DFORK* macros References: <20070922102238.GA15732@lst.de> <20071125163102.GB17922@infradead.org> In-Reply-To: <20071125163102.GB17922@infradead.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4911/Sun Nov 25 09:58:35 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13789 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: tes@sgi.com Precedence: bulk X-list: xfs Christoph Hellwig wrote: > On Sat, Sep 22, 2007 at 12:22:38PM +0200, Christoph Hellwig wrote: >> (try number three, maybe it manages to get through the list this time) >> >> Currently XFS_IFORK_* and XFS_DFORK* are implemented by means of >> XFS_CFORK* macros. But given that XFS_IFORK_* operates on an >> xfs_inode that embedds and xfs_icdinode_core and XFS_DFORK_* operates >> on an xfs_dinode that embedds a xfs_dinode_core one will have to do >> endian swapping while the other doesn't. Instead of having the current >> mess with the CFORK macros that have byteswapping and non-byteswapping >> version (which are inconsistantly named while we're at it) just define >> each family of the macros to stand by itself and simplify the whole >> matter. >> >> A few direct references to the CFORK variants were cleaned up to >> use IFORK or DFORK to make this possible. > > ping? this is almost two month old now.. > Yeah, this looks good to me. Good-bye CFORK macros. I guess the downside is that some commonality will now be in 2 places. For example, if forkoff changed to mean something other than in multiples of 8 bytes (i.e. so we no longer shift by 3) then we'd now need to change that in 2 files. (That aint going to happen) So to check consistency I compared xfs_dinode.h with xfs_inode.h macro definitions which wouldn't be needed before. However, I think the simplification is worth it. --Tim >> >> Signed-off-by: Christoph Hellwig >> >> Index: linux-2.6-xfs/fs/xfs/xfs_dinode.h >> =================================================================== >> --- linux-2.6-xfs.orig/fs/xfs/xfs_dinode.h 2007-08-23 18:52:49.000000000 +0200 >> +++ linux-2.6-xfs/fs/xfs/xfs_dinode.h 2007-09-19 15:16:30.000000000 +0200 >> @@ -171,69 +171,35 @@ typedef enum xfs_dinode_fmt >> /* >> * Inode data & attribute fork sizes, per inode. >> */ >> -#define XFS_CFORK_Q(dcp) ((dcp)->di_forkoff != 0) >> -#define XFS_CFORK_Q_DISK(dcp) ((dcp)->di_forkoff != 0) >> - >> -#define XFS_CFORK_BOFF(dcp) ((int)((dcp)->di_forkoff << 3)) >> -#define XFS_CFORK_BOFF_DISK(dcp) ((int)((dcp)->di_forkoff << 3)) >> - >> -#define XFS_CFORK_DSIZE_DISK(dcp,mp) \ >> - (XFS_CFORK_Q_DISK(dcp) ? XFS_CFORK_BOFF_DISK(dcp) : XFS_LITINO(mp)) >> -#define XFS_CFORK_DSIZE(dcp,mp) \ >> - (XFS_CFORK_Q(dcp) ? XFS_CFORK_BOFF(dcp) : XFS_LITINO(mp)) >> - >> -#define XFS_CFORK_ASIZE_DISK(dcp,mp) \ >> - (XFS_CFORK_Q_DISK(dcp) ? XFS_LITINO(mp) - XFS_CFORK_BOFF_DISK(dcp) : 0) >> -#define XFS_CFORK_ASIZE(dcp,mp) \ >> - (XFS_CFORK_Q(dcp) ? XFS_LITINO(mp) - XFS_CFORK_BOFF(dcp) : 0) >> - >> -#define XFS_CFORK_SIZE_DISK(dcp,mp,w) \ >> - ((w) == XFS_DATA_FORK ? \ >> - XFS_CFORK_DSIZE_DISK(dcp, mp) : \ >> - XFS_CFORK_ASIZE_DISK(dcp, mp)) >> -#define XFS_CFORK_SIZE(dcp,mp,w) \ >> - ((w) == XFS_DATA_FORK ? \ >> - XFS_CFORK_DSIZE(dcp, mp) : XFS_CFORK_ASIZE(dcp, mp)) >> +#define XFS_DFORK_Q(dip) ((dip)->di_core.di_forkoff != 0) >> +#define XFS_DFORK_BOFF(dip) ((int)((dip)->di_core.di_forkoff << 3)) >> >> #define XFS_DFORK_DSIZE(dip,mp) \ >> - XFS_CFORK_DSIZE_DISK(&(dip)->di_core, mp) >> -#define XFS_DFORK_DSIZE_HOST(dip,mp) \ >> - XFS_CFORK_DSIZE(&(dip)->di_core, mp) >> + (XFS_DFORK_Q(dip) ? \ >> + XFS_DFORK_BOFF(dip) : \ >> + XFS_LITINO(mp)) >> #define XFS_DFORK_ASIZE(dip,mp) \ >> - XFS_CFORK_ASIZE_DISK(&(dip)->di_core, mp) >> -#define XFS_DFORK_ASIZE_HOST(dip,mp) \ >> - XFS_CFORK_ASIZE(&(dip)->di_core, mp) >> -#define XFS_DFORK_SIZE(dip,mp,w) \ >> - XFS_CFORK_SIZE_DISK(&(dip)->di_core, mp, w) >> -#define XFS_DFORK_SIZE_HOST(dip,mp,w) \ >> - XFS_CFORK_SIZE(&(dip)->di_core, mp, w) >> - >> -#define XFS_DFORK_Q(dip) XFS_CFORK_Q_DISK(&(dip)->di_core) >> -#define XFS_DFORK_BOFF(dip) XFS_CFORK_BOFF_DISK(&(dip)->di_core) >> -#define XFS_DFORK_DPTR(dip) ((dip)->di_u.di_c) >> -#define XFS_DFORK_APTR(dip) \ >> - ((dip)->di_u.di_c + XFS_DFORK_BOFF(dip)) >> -#define XFS_DFORK_PTR(dip,w) \ >> - ((w) == XFS_DATA_FORK ? XFS_DFORK_DPTR(dip) : XFS_DFORK_APTR(dip)) >> -#define XFS_CFORK_FORMAT(dcp,w) \ >> - ((w) == XFS_DATA_FORK ? (dcp)->di_format : (dcp)->di_aformat) >> -#define XFS_CFORK_FMT_SET(dcp,w,n) \ >> + (XFS_DFORK_Q(dip) ? \ >> + XFS_LITINO(mp) - XFS_DFORK_BOFF(dip) : \ >> + 0) >> +#define XFS_DFORK_SIZE(dip,mp,w) \ >> ((w) == XFS_DATA_FORK ? \ >> - ((dcp)->di_format = (n)) : ((dcp)->di_aformat = (n))) >> -#define XFS_DFORK_FORMAT(dip,w) XFS_CFORK_FORMAT(&(dip)->di_core, w) >> + XFS_DFORK_DSIZE(dip, mp) : \ >> + XFS_DFORK_ASIZE(dip, mp)) >> >> -#define XFS_CFORK_NEXTENTS_DISK(dcp,w) \ >> +#define XFS_DFORK_DPTR(dip) ((dip)->di_u.di_c) >> +#define XFS_DFORK_APTR(dip) \ >> + ((dip)->di_u.di_c + XFS_DFORK_BOFF(dip)) >> +#define XFS_DFORK_PTR(dip,w) \ >> + ((w) == XFS_DATA_FORK ? XFS_DFORK_DPTR(dip) : XFS_DFORK_APTR(dip)) >> +#define XFS_DFORK_FORMAT(dip,w) \ >> ((w) == XFS_DATA_FORK ? \ >> - be32_to_cpu((dcp)->di_nextents) : \ >> - be16_to_cpu((dcp)->di_anextents)) >> -#define XFS_CFORK_NEXTENTS(dcp,w) \ >> - ((w) == XFS_DATA_FORK ? (dcp)->di_nextents : (dcp)->di_anextents) >> -#define XFS_DFORK_NEXTENTS(dip,w) XFS_CFORK_NEXTENTS_DISK(&(dip)->di_core, w) >> -#define XFS_DFORK_NEXTENTS_HOST(dip,w) XFS_CFORK_NEXTENTS(&(dip)->di_core, w) >> - >> -#define XFS_CFORK_NEXT_SET(dcp,w,n) \ >> + (dip)->di_core.di_format : \ >> + (dip)->di_core.di_aformat) >> +#define XFS_DFORK_NEXTENTS(dip,w) \ >> ((w) == XFS_DATA_FORK ? \ >> - ((dcp)->di_nextents = (n)) : ((dcp)->di_anextents = (n))) >> + be32_to_cpu((dip)->di_core.di_nextents) : \ >> + be16_to_cpu((dip)->di_core.di_anextents)) >> >> #define XFS_BUF_TO_DINODE(bp) ((xfs_dinode_t *)XFS_BUF_PTR(bp)) >> >> Index: linux-2.6-xfs/fs/xfs/xfs_inode.h >> =================================================================== >> --- linux-2.6-xfs.orig/fs/xfs/xfs_inode.h 2007-09-19 15:09:31.000000000 +0200 >> +++ linux-2.6-xfs/fs/xfs/xfs_inode.h 2007-09-19 15:16:30.000000000 +0200 >> @@ -341,17 +341,42 @@ xfs_iflags_test_and_clear(xfs_inode_t *i >> /* >> * Fork handling. >> */ >> -#define XFS_IFORK_PTR(ip,w) \ >> - ((w) == XFS_DATA_FORK ? &(ip)->i_df : (ip)->i_afp) >> -#define XFS_IFORK_Q(ip) XFS_CFORK_Q(&(ip)->i_d) >> -#define XFS_IFORK_DSIZE(ip) XFS_CFORK_DSIZE(&ip->i_d, ip->i_mount) >> -#define XFS_IFORK_ASIZE(ip) XFS_CFORK_ASIZE(&ip->i_d, ip->i_mount) >> -#define XFS_IFORK_SIZE(ip,w) XFS_CFORK_SIZE(&ip->i_d, ip->i_mount, w) >> -#define XFS_IFORK_FORMAT(ip,w) XFS_CFORK_FORMAT(&ip->i_d, w) >> -#define XFS_IFORK_FMT_SET(ip,w,n) XFS_CFORK_FMT_SET(&ip->i_d, w, n) >> -#define XFS_IFORK_NEXTENTS(ip,w) XFS_CFORK_NEXTENTS(&ip->i_d, w) >> -#define XFS_IFORK_NEXT_SET(ip,w,n) XFS_CFORK_NEXT_SET(&ip->i_d, w, n) >> >> +#define XFS_IFORK_Q(ip) ((ip)->i_d.di_forkoff != 0) >> +#define XFS_IFORK_BOFF(ip) ((int)((ip)->i_d.di_forkoff << 3)) >> + >> +#define XFS_IFORK_PTR(ip,w) \ >> + ((w) == XFS_DATA_FORK ? \ >> + &(ip)->i_df : \ >> + (ip)->i_afp) >> +#define XFS_IFORK_DSIZE(ip) \ >> + (XFS_IFORK_Q(ip) ? \ >> + XFS_IFORK_BOFF(ip) : \ >> + XFS_LITINO((ip)->i_mount)) >> +#define XFS_IFORK_ASIZE(ip) \ >> + (XFS_IFORK_Q(ip) ? \ >> + XFS_LITINO((ip)->i_mount) - XFS_IFORK_BOFF(ip) : \ >> + 0) >> +#define XFS_IFORK_SIZE(ip,w) \ >> + ((w) == XFS_DATA_FORK ? \ >> + XFS_IFORK_DSIZE(ip) : \ >> + XFS_IFORK_ASIZE(ip)) >> +#define XFS_IFORK_FORMAT(ip,w) \ >> + ((w) == XFS_DATA_FORK ? \ >> + (ip)->i_d.di_format : \ >> + (ip)->i_d.di_aformat) >> +#define XFS_IFORK_FMT_SET(ip,w,n) \ >> + ((w) == XFS_DATA_FORK ? \ >> + ((ip)->i_d.di_format = (n)) : \ >> + ((ip)->i_d.di_aformat = (n))) >> +#define XFS_IFORK_NEXTENTS(ip,w) \ >> + ((w) == XFS_DATA_FORK ? \ >> + (ip)->i_d.di_nextents : \ >> + (ip)->i_d.di_anextents) >> +#define XFS_IFORK_NEXT_SET(ip,w,n) \ >> + ((w) == XFS_DATA_FORK ? \ >> + ((ip)->i_d.di_nextents = (n)) : \ >> + ((ip)->i_d.di_anextents = (n))) >> >> #ifdef __KERNEL__ >> >> @@ -504,7 +529,7 @@ void xfs_dinode_to_disk(struct xfs_dino >> struct xfs_icdinode *); >> >> uint xfs_ip2xflags(struct xfs_inode *); >> -uint xfs_dic2xflags(struct xfs_dinode_core *); >> +uint xfs_dic2xflags(struct xfs_dinode *); >> int xfs_ifree(struct xfs_trans *, xfs_inode_t *, >> struct xfs_bmap_free *); >> int xfs_itruncate_start(xfs_inode_t *, uint, xfs_fsize_t); >> Index: linux-2.6-xfs/fs/xfs/xfs_inode.c >> =================================================================== >> --- linux-2.6-xfs.orig/fs/xfs/xfs_inode.c 2007-09-19 15:09:31.000000000 +0200 >> +++ linux-2.6-xfs/fs/xfs/xfs_inode.c 2007-09-19 15:16:30.000000000 +0200 >> @@ -826,15 +826,17 @@ xfs_ip2xflags( >> xfs_icdinode_t *dic = &ip->i_d; >> >> return _xfs_dic2xflags(dic->di_flags) | >> - (XFS_CFORK_Q(dic) ? XFS_XFLAG_HASATTR : 0); >> + (XFS_IFORK_Q(ip) ? XFS_XFLAG_HASATTR : 0); >> } >> >> uint >> xfs_dic2xflags( >> - xfs_dinode_core_t *dic) >> + xfs_dinode_t *dip) >> { >> + xfs_dinode_core_t *dic = &dip->di_core; >> + >> return _xfs_dic2xflags(be16_to_cpu(dic->di_flags)) | >> - (XFS_CFORK_Q_DISK(dic) ? XFS_XFLAG_HASATTR : 0); >> + (XFS_DFORK_Q(dip) ? XFS_XFLAG_HASATTR : 0); >> } >> >> /* >> Index: linux-2.6-xfs/fs/xfs/xfs_itable.c >> =================================================================== >> --- linux-2.6-xfs.orig/fs/xfs/xfs_itable.c 2007-09-12 13:56:17.000000000 +0200 >> +++ linux-2.6-xfs/fs/xfs/xfs_itable.c 2007-09-19 15:16:30.000000000 +0200 >> @@ -170,7 +170,7 @@ xfs_bulkstat_one_dinode( >> buf->bs_mtime.tv_nsec = be32_to_cpu(dic->di_mtime.t_nsec); >> buf->bs_ctime.tv_sec = be32_to_cpu(dic->di_ctime.t_sec); >> buf->bs_ctime.tv_nsec = be32_to_cpu(dic->di_ctime.t_nsec); >> - buf->bs_xflags = xfs_dic2xflags(dic); >> + buf->bs_xflags = xfs_dic2xflags(dip); >> buf->bs_extsize = be32_to_cpu(dic->di_extsize) << mp->m_sb.sb_blocklog; >> buf->bs_extents = be32_to_cpu(dic->di_nextents); >> buf->bs_gen = be32_to_cpu(dic->di_gen); >> @@ -299,7 +299,7 @@ xfs_bulkstat_use_dinode( >> } >> /* BULKSTAT_FG_INLINE: if attr fork is local, or not there, use it */ >> aformat = dip->di_core.di_aformat; >> - if ((XFS_CFORK_Q(&dip->di_core) == 0) || >> + if ((XFS_DFORK_Q(dip) == 0) || >> (aformat == XFS_DINODE_FMT_LOCAL) || >> (aformat == XFS_DINODE_FMT_EXTENTS && !dip->di_core.di_anextents)) { >> *dipp = dip; >> Index: linux-2.6-xfs/fs/xfs/dmapi/xfs_dm.c >> =================================================================== >> --- linux-2.6-xfs.orig/fs/xfs/dmapi/xfs_dm.c 2007-09-19 18:50:35.000000000 +0200 >> +++ linux-2.6-xfs/fs/xfs/dmapi/xfs_dm.c 2007-09-19 18:51:01.000000000 +0200 >> @@ -355,7 +355,7 @@ xfs_ip2dmflags( >> xfs_inode_t *ip) >> { >> return _xfs_dic2dmflags(ip->i_d.di_flags) | >> - (XFS_CFORK_Q(&ip->i_d) ? DM_XFLAG_HASATTR : 0); >> + (XFS_IFORK_Q(ip) ? DM_XFLAG_HASATTR : 0); >> } >> >> STATIC uint >> @@ -363,8 +363,7 @@ xfs_dic2dmflags( >> xfs_dinode_t *dip) >> { >> return _xfs_dic2dmflags(be16_to_cpu(dip->di_core.di_flags)) | >> - (XFS_CFORK_Q_DISK(&dip->di_core) ? >> - DM_XFLAG_HASATTR : 0); >> + (XFS_DFORK_Q(dip) ? DM_XFLAG_HASATTR : 0); >> } >> >> /* >> >> > ---end quoted text--- > From owner-xfs@oss.sgi.com Sun Nov 25 21:02:57 2007 Received: with ECARTIS (v1.0.0; list xfs); Sun, 25 Nov 2007 21:03:00 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAQ52scn009028 for ; Sun, 25 Nov 2007 21:02:55 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA07255; Mon, 26 Nov 2007 16:03:01 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAQ530dD120163769; Mon, 26 Nov 2007 16:03:01 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAQ530Mb120833169; Mon, 26 Nov 2007 16:03:00 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Mon, 26 Nov 2007 16:03:00 +1100 From: David Chinner To: Lachlan McIlroy Cc: David Chinner , xfs-dev , xfs-oss Subject: Re: [PATCH, RFC] Delayed logging of file sizes Message-ID: <20071126050300.GI114266761@sgi.com> References: <47467B87.2000000@sgi.com> <20071125225928.GE114266761@sgi.com> <474A112D.2040006@sgi.com> <20071126011044.GG114266761@sgi.com> <474A2180.7000605@sgi.com> <20071126021515.GH114266761@sgi.com> <474A3A92.2040200@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <474A3A92.2040200@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4912/Sun Nov 25 20:34:30 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13790 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Mon, Nov 26, 2007 at 02:16:34PM +1100, Lachlan McIlroy wrote: > David Chinner wrote: > >On Mon, Nov 26, 2007 at 12:29:36PM +1100, Lachlan McIlroy wrote: > >>David Chinner wrote: > >>>On Mon, Nov 26, 2007 at 11:19:57AM +1100, Lachlan McIlroy wrote: > >>>>David Chinner wrote: > >>>>>On Fri, Nov 23, 2007 at 06:04:39PM +1100, Lachlan McIlroy wrote: > >>>>>>The easy solution is to log everything so that log replay doesn't need > >>>>>>to check if the on-disk version is newer - it can just replay the log. > >>>>>>But logging everything would cause too much log traffic so this patch > >>>>>>is a compromise and it logs a transaction before we flush an inode to > >>>>>>disk only if it has changes that have not yet been logged. > >>>>>The problem with this is that the inode will be marked dirty during the > >>>>>transaction, so we'll never be able to clean an inode if we issue a > >>>>>transaction during inode writeback. > >>>>Ah, yeah, good point. I wrote this patch back before that "dirty inode > >>>>on transaction" patch went in. > >>>Wouldn't have made aany difference - the inode woul dbe marked dirty > >>>at transaction completion... > >>> > >>>>For this transaction though the changes > >>>>to the inode have already been made (ie when we set i_update_core and > >>>>called mark_inode_dirty_sync()) so there is no need to dirty it in this > >>>>transaction. I'll keep digging. Thanks. > >>>I wouldn't worry too much about this problem right now - I'm working > >>>on moving the dirty state into the inode radix trees so i_update_core > >>>might even go away completely soon.... > >>> > >>Which problem? Just the bit about dirtying the inode or will your changes > >>allow us to log all inode changes? > > > >Trying to change XFS to logging all updates. > > That would be great. But what about the increase in log traffic that has > deterred us from doing this in the past? Sorry, i wasn't particularly clear. What I mean was that i_update_core might disappear completely with the changes I'm making. Basically, we have three different methods of marking the inode dirty at the moment - on the linux inode (mark_inode_dirty[_sync]()), the i_update_core = 1 for unlogged changes and logged changes are tracked via the inode log item in the AIL. One top of that, we have three different methods of flushing them - one from the generic code for inodes dirtied by mark_inode_dirty(), one from xfssyncd for inodes that are only dirtied by setting i_update_core = 1 and the other from the xfsaild when log tail pushing. Ideally we should only have a single method for pushing out inodes. The first step to that is tracking the dirty state in a single tree (the inode radix trees). That means we have to hook ->dirty_inode() to catch all dirtying via mark_inode_dirty[_sync]() and mark the inodes dirty in the radix tree. Then we need to use xfs_mark_inode_dirty_sync() everywhere that we dirty the inode. Once we have all the dirty state in the radix trees we can now get rid of i_update_core and i_update_size - all they do is mark the inode dirty and we don't really care about the difference between them(*) - and just use the dirty bit in the radix tree when necessary. To flush the dirty inodes we just do radix_tree_gang_lookup_tag_range() calls to do ascending cluster order writeback. This will replace the mount inode list walking in xfs_sync_inodes() and other places to find dirty inodes. /me puts on flame-proof suite I'd even like to go as far as a two pass writeback algorithm; pass one only writes data, and pass two only writes inodes. The second pass for XFS needs to be delayed until data writeback is complete because of delalloc and inode size updates redirtying the inode. The current mechanism means we often do two inode writes for the one data write... Basically, our writeback code is a mess and I want to clean it up before we try to deal with the unlogged changes.... Cheers, Dave. (*) Even for FDATASYNC we should always force the log out because we may have delayed allocation transactions still sitting in iclog buffers. This, AFAICT, is a bug in the current implementation. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Mon Nov 26 13:08:49 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 26 Nov 2007 13:08:53 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAQL8hR4001162 for ; Mon, 26 Nov 2007 13:08:47 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id IAA01524; Tue, 27 Nov 2007 08:08:50 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAQL8ldD120802076; Tue, 27 Nov 2007 08:08:48 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAQL8ibq122012951; Tue, 27 Nov 2007 08:08:44 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Tue, 27 Nov 2007 08:08:44 +1100 From: David Chinner To: David Chinner , linux-kernel@vger.kernel.org Cc: rjw@sisk.pl, xfs@oss.sgi.com Subject: Re: XFS related Oops (suspend/resume related) Message-ID: <20071126210844.GB119954183@sgi.com> References: <20071112064706.GA23595@dose.home.local> <20071112222720.GG995458@sgi.com> <20071113105119.GA11527@dose.home.local> <20071113230445.GE995458@sgi.com> <20071126131210.GA4430@eazy.amigager.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071126131210.GA4430@eazy.amigager.de> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4928/Mon Nov 26 10:10:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13791 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Mon, Nov 26, 2007 at 02:12:10PM +0100, Tino Keitel wrote: > On Wed, Nov 14, 2007 at 10:04:45 +1100, David Chinner wrote: > > On Tue, Nov 13, 2007 at 11:51:19AM +0100, Tino Keitel wrote: > > > On Tue, Nov 13, 2007 at 09:27:20 +1100, David Chinner wrote: > > > > > > [...] > > > > > > > No. I'd say something got screwed up during suspend/resume. Is it > > > > reproducable? > > > > > > No. I often use suspend to RAM, and usually it works without such > > > failures. I restart squid during the resume prosecure, and the above > > > Oops lead to a squid in D state. > > > > Ok. Sounds like there's not much we can debug at this point. Thanks > > for the report, though. > > I got a similar Oops again: > > xfs_iget_core: ambiguous vns: vp/0xc00700c0, invp/0xcb5a1680 Now there's a message that I haven't seen in about 3 years. It indicates that the linux inode connected to the xfs_inode is not the correct one. i.e. that the linux inode cache is out of step with the XFS inode cache. Basically, that is not supposed to happen. I suspect that the way threads are frozen is resulting in an inode lookup racing with a reclaim. The reclaim thread gets stopped after any use threads, and so we could have the situation that a process blocked in lookup has the XFS inode reclaimed and reused before it gets unblocked. The question is why is it happening now when none of that code in XFS has changed? Rafael, when are threads frozen? Only when they schedule or call try_to_freeze()? Did the freezer mechanism change in 2.6.23 (this is on 2.6.23.1)? Is there some way of getting a stack trace of all the processes in the system once the machine is frozen and about to suspend so we can see if we blocked in a lookup? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Mon Nov 26 14:21:14 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 26 Nov 2007 14:21:17 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: * X-Spam-Status: No, score=2.0 required=5.0 tests=AWL,BAYES_00, RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_PSBL autolearn=no version=3.3.0-r574664 Received: from ogre.sisk.pl (ogre.sisk.pl [217.79.144.158]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAQMLAlS011839 for ; Mon, 26 Nov 2007 14:21:13 -0800 Received: from localhost (localhost.localdomain [127.0.0.1]) by ogre.sisk.pl (Postfix) with ESMTP id 3FC9D73468; Mon, 26 Nov 2007 22:48:02 +0100 (CET) Received: from ogre.sisk.pl ([127.0.0.1]) by localhost (ogre.sisk.pl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 32565-01-6; Mon, 26 Nov 2007 22:47:55 +0100 (CET) Received: from [192.168.100.119] (nat-be3.aster.pl [212.76.37.200]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ogre.sisk.pl (Postfix) with ESMTP id 79D387347E; Mon, 26 Nov 2007 22:47:14 +0100 (CET) From: "Rafael J. Wysocki" To: David Chinner Subject: Re: XFS related Oops (suspend/resume related) Date: Mon, 26 Nov 2007 23:07:56 +0100 User-Agent: KMail/1.9.6 (enterprise 20070904.708012) Cc: linux-kernel@vger.kernel.org, xfs@oss.sgi.com References: <20071112064706.GA23595@dose.home.local> <20071126131210.GA4430@eazy.amigager.de> <20071126210844.GB119954183@sgi.com> In-Reply-To: <20071126210844.GB119954183@sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200711262307.56742.rjw@sisk.pl> X-Virus-Scanned: ClamAV 0.91.2/4928/Mon Nov 26 10:10:39 2007 on oss.sgi.com X-Virus-Scanned: amavisd-new at ogre.sisk.pl using MkS_Vir for Linux X-Virus-Status: Clean X-archive-position: 13792 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: rjw@sisk.pl Precedence: bulk X-list: xfs On Monday, 26 of November 2007, David Chinner wrote: > On Mon, Nov 26, 2007 at 02:12:10PM +0100, Tino Keitel wrote: > > On Wed, Nov 14, 2007 at 10:04:45 +1100, David Chinner wrote: > > > On Tue, Nov 13, 2007 at 11:51:19AM +0100, Tino Keitel wrote: > > > > On Tue, Nov 13, 2007 at 09:27:20 +1100, David Chinner wrote: > > > > > > > > [...] > > > > > > > > > No. I'd say something got screwed up during suspend/resume. Is it > > > > > reproducable? > > > > > > > > No. I often use suspend to RAM, and usually it works without such > > > > failures. I restart squid during the resume prosecure, and the above > > > > Oops lead to a squid in D state. > > > > > > Ok. Sounds like there's not much we can debug at this point. Thanks > > > for the report, though. > > > > I got a similar Oops again: > > > > xfs_iget_core: ambiguous vns: vp/0xc00700c0, invp/0xcb5a1680 > > Now there's a message that I haven't seen in about 3 years. > > It indicates that the linux inode connected to the xfs_inode is not > the correct one. i.e. that the linux inode cache is out of step with > the XFS inode cache. > > Basically, that is not supposed to happen. I suspect that the way > threads are frozen is resulting in an inode lookup racing with > a reclaim. The reclaim thread gets stopped after any use threads, > and so we could have the situation that a process blocked in lookup > has the XFS inode reclaimed and reused before it gets unblocked. > > The question is why is it happening now when none of that code in > XFS has changed? > > Rafael, when are threads frozen? Only when they schedule or call > try_to_freeze()? Kernel threads freeze only when they call try_to_freeze(). User space tasks freeze while executing the signals handling code. > Did the freezer mechanism change in 2.6.23 (this is on 2.6.23.1)? Yes. Kernel threads are not sent fake signals by the freezer any more. > Is there some way of getting a stack trace of all the > processes in the system once the machine is frozen and about to > suspend so we can see if we blocked in a lookup? Yes. Please add show_state() before the last "return" in freeze_processes(). On 2.6.23.1 you can test the freezer alone by doing # echo testproc > /sys/power/disk # echo disk > /sys/power/state Greetings, Rafael From owner-xfs@oss.sgi.com Mon Nov 26 18:22:56 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 26 Nov 2007 18:23:00 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAR2MqVu008647 for ; Mon, 26 Nov 2007 18:22:54 -0800 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA10956; Tue, 27 Nov 2007 13:22:57 +1100 Received: by chook.melbourne.sgi.com (Postfix, from userid 44625) id 0CD3458C4C0F; Tue, 27 Nov 2007 13:22:56 +1100 (EST) To: sgi.bugs.xfs@engr.sgi.com, xfs@oss.sgi.com Subject: TAKE 971596 - Fixed a few bugs in xfs_buf_associate_memory() Message-Id: <20071127022257.0CD3458C4C0F@chook.melbourne.sgi.com> Date: Tue, 27 Nov 2007 13:22:56 +1100 (EST) From: lachlan@sgi.com (Lachlan McIlroy) X-Virus-Scanned: ClamAV 0.91.2/4928/Mon Nov 26 10:10:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13793 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs Fixed a few bugs in xfs_buf_associate_memory() - calculation of 'page_count' was incorrect as it did not consider the offset of 'mem' into the first page. The logic to bump 'page_count' didn't work if 'len' was <= PAGE_CACHE_SIZE (ie offset = 3k, len = 2k). - setting b_buffer_length to 'len' is incorrect if 'offset' is > 0. Set it to the total length of the buffer. - I suspect that passing a non-aligned address into mem_to_page() for the first page may have been causing issues - don't know but just tidy up that code anyway. Date: Tue Nov 27 13:21:49 AEDT 2007 Workarea: redback.melbourne.sgi.com:/home/lachlan/isms/2.6.x-bufmem Inspected by: hch Author: lachlan The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb Modid: xfs-linux-melb:xfs-kern:30143a fs/xfs/linux-2.6/xfs_buf.c - 1.249 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_buf.c.diff?r1=text&tr1=1.249&r2=text&tr2=1.248&f=h - Fixed a few bugs in xfs_buf_associate_memory(). From owner-xfs@oss.sgi.com Mon Nov 26 19:31:27 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 26 Nov 2007 19:31:31 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAR3VMrZ016411 for ; Mon, 26 Nov 2007 19:31:26 -0800 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA12761; Tue, 27 Nov 2007 14:31:25 +1100 Message-ID: <474B8F51.5030102@sgi.com> Date: Tue, 27 Nov 2007 14:30:25 +1100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.9 (X11/20071031) MIME-Version: 1.0 To: David Chinner CC: xfs-dev , xfs-oss Subject: Re: [PATCH, RFC] Delayed logging of file sizes References: <47467B87.2000000@sgi.com> <20071125225928.GE114266761@sgi.com> <474A112D.2040006@sgi.com> <20071126011044.GG114266761@sgi.com> <474A2180.7000605@sgi.com> <20071126021515.GH114266761@sgi.com> <474A3A92.2040200@sgi.com> <20071126050300.GI114266761@sgi.com> In-Reply-To: <20071126050300.GI114266761@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4928/Mon Nov 26 10:10:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13794 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs David Chinner wrote: > On Mon, Nov 26, 2007 at 02:16:34PM +1100, Lachlan McIlroy wrote: >> David Chinner wrote: >>> On Mon, Nov 26, 2007 at 12:29:36PM +1100, Lachlan McIlroy wrote: >>>> David Chinner wrote: >>>>> On Mon, Nov 26, 2007 at 11:19:57AM +1100, Lachlan McIlroy wrote: >>>>>> David Chinner wrote: >>>>>>> On Fri, Nov 23, 2007 at 06:04:39PM +1100, Lachlan McIlroy wrote: >>>>>>>> The easy solution is to log everything so that log replay doesn't need >>>>>>>> to check if the on-disk version is newer - it can just replay the log. >>>>>>>> But logging everything would cause too much log traffic so this patch >>>>>>>> is a compromise and it logs a transaction before we flush an inode to >>>>>>>> disk only if it has changes that have not yet been logged. >>>>>>> The problem with this is that the inode will be marked dirty during the >>>>>>> transaction, so we'll never be able to clean an inode if we issue a >>>>>>> transaction during inode writeback. >>>>>> Ah, yeah, good point. I wrote this patch back before that "dirty inode >>>>>> on transaction" patch went in. >>>>> Wouldn't have made aany difference - the inode woul dbe marked dirty >>>>> at transaction completion... >>>>> >>>>>> For this transaction though the changes >>>>>> to the inode have already been made (ie when we set i_update_core and >>>>>> called mark_inode_dirty_sync()) so there is no need to dirty it in this >>>>>> transaction. I'll keep digging. Thanks. >>>>> I wouldn't worry too much about this problem right now - I'm working >>>>> on moving the dirty state into the inode radix trees so i_update_core >>>>> might even go away completely soon.... >>>>> >>>> Which problem? Just the bit about dirtying the inode or will your changes >>>> allow us to log all inode changes? >>> Trying to change XFS to logging all updates. >> That would be great. But what about the increase in log traffic that has >> deterred us from doing this in the past? > > Sorry, i wasn't particularly clear. What I mean was that i_update_core > might disappear completely with the changes I'm making. > > Basically, we have three different methods of marking the inode dirty > at the moment - on the linux inode (mark_inode_dirty[_sync]()), the > i_update_core = 1 for unlogged changes and logged changes are tracked via the > inode log item in the AIL. > > One top of that, we have three different methods of flushing them - one > from the generic code for inodes dirtied by mark_inode_dirty(), one from > xfssyncd for inodes that are only dirtied by setting i_update_core = 1 > and the other from the xfsaild when log tail pushing. > > Ideally we should only have a single method for pushing out inodes. The first > step to that is tracking the dirty state in a single tree (the inode radix > trees). That means we have to hook ->dirty_inode() to catch all dirtying via > mark_inode_dirty[_sync]() and mark the inodes dirty in the radix tree. Then we > need to use xfs_mark_inode_dirty_sync() everywhere that we dirty the inode. Don't we already call mark_inode_dirty[_sync]() everywhere we dirty the inode? > > Once we have all the dirty state in the radix trees we can now get rid of > i_update_core and i_update_size - all they do is mark the inode dirty and > we don't really care about the difference between them(*) - and just use > the dirty bit in the radix tree when necessary. If we want to check if an inode is dirty do we have to look up the dirty bit in the tree or is there some easy way to get it from the inode? By consolidating the different ways of dirtying an inode we lose the ability to know why it is dirty and what action needs to be done to undirty it. For example if the inode log item has bits set then we know we have to flush the log otherwise there is no need. With a general purpose dirty bit we will have to flush regardless. And my recent attempt to fix the log replay issue relies on i_update_core to indicate there are unlogged changes - I don't see how that will work with these changes. > > To flush the dirty inodes we just do radix_tree_gang_lookup_tag_range() > calls to do ascending cluster order writeback. This will replace the > mount inode list walking in xfs_sync_inodes() and other places to find > dirty inodes. > > /me puts on flame-proof suite > > I'd even like to go as far as a two pass writeback algorithm; pass > one only writes data, and pass two only writes inodes. The second pass > for XFS needs to be delayed until data writeback is complete because of > delalloc and inode size updates redirtying the inode. The current > mechanism means we often do two inode writes for the one data write... > > Basically, our writeback code is a mess and I want to clean it up > before we try to deal with the unlogged changes.... > > Cheers, > > Dave. > > (*) Even for FDATASYNC we should always force the log out because we may > have delayed allocation transactions still sitting in iclog buffers. This, > AFAICT, is a bug in the current implementation. From owner-xfs@oss.sgi.com Mon Nov 26 20:01:24 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 26 Nov 2007 20:01:27 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: **** X-Spam-Status: No, score=4.0 required=5.0 tests=BAYES_99 autolearn=no version=3.3.0-r574664 Received: from hamley.schedom-europe.net (hamley.schedom-europe.net [193.109.187.100]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAR41LGH019854 for ; Mon, 26 Nov 2007 20:01:23 -0800 Received: (qmail 31837 invoked by uid 48); 27 Nov 2007 04:34:50 +0100 Date: 27 Nov 2007 04:34:50 +0100 Message-ID: <20071127033450.31835.qmail@hamley.schedom-europe.net> To: xfs@oss.sgi.com Subject: Appointment letter From: David Roger Reply-To: davidroger11@yahoo.com.hk MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV 0.91.2/4928/Mon Nov 26 10:10:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13795 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: davidroger207@yahoo.com.hk Precedence: bulk X-list: xfs lee chong Textile Co., Ltd. I am David Roger manager of chong Textile.What we need is a U.S.A and canada representative.Our Company chong Textile is based in No.6 The Third District Nanshan Road, Shengze,Wujiang City, Jiangsu Province. China. We are experts in the sale of Textile materials; we export into the Canada/America, India, and parts Europe. We are searching for representatives who can help us establish a medium of getting our funds from our customers in these areas as well as making payments through you to us. Please if interested in working as the companies representative in your country, then our clients could make payment through you. For every payment made through you,10% will be paid to you. Selected Products which the company exports for now are below: Jacquard Fabric Cotton Satin Printed Satin Twisted Satin If interested,fill the info below Note that no form of payment will be requested upfront from you in this endeavor. the following so that a customer will contact you: NAME.................................. MAILING ADDRESS...................... AGE:................................. CELL PHONE NUMBER....................... COMPANY NAME (If any)............................. OCCUPATION............................... Country.................................. DAVID ROGER From owner-xfs@oss.sgi.com Mon Nov 26 20:02:51 2007 Received: with ECARTIS (v1.0.0; list xfs); Mon, 26 Nov 2007 20:02:54 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: **** X-Spam-Status: No, score=4.0 required=5.0 tests=BAYES_99 autolearn=no version=3.3.0-r574664 Received: from hamley.schedom-europe.net (hamley.schedom-europe.net [193.109.187.100]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAR42nGV020283 for ; Mon, 26 Nov 2007 20:02:50 -0800 Received: (qmail 488 invoked by uid 48); 27 Nov 2007 04:34:57 +0100 Date: 27 Nov 2007 04:34:57 +0100 Message-ID: <20071127033457.485.qmail@hamley.schedom-europe.net> To: xfs@oss.sgi.com Subject: Appointment letter From: David Roger Reply-To: davidroger11@yahoo.com.hk MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV 0.91.2/4928/Mon Nov 26 10:10:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13796 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: davidroger207@yahoo.com.hk Precedence: bulk X-list: xfs lee chong Textile Co., Ltd. I am David Roger manager of chong Textile.What we need is a U.S.A and canada representative.Our Company chong Textile is based in No.6 The Third District Nanshan Road, Shengze,Wujiang City, Jiangsu Province. China. We are experts in the sale of Textile materials; we export into the Canada/America, India, and parts Europe. We are searching for representatives who can help us establish a medium of getting our funds from our customers in these areas as well as making payments through you to us. Please if interested in working as the companies representative in your country, then our clients could make payment through you. For every payment made through you,10% will be paid to you. Selected Products which the company exports for now are below: Jacquard Fabric Cotton Satin Printed Satin Twisted Satin If interested,fill the info below Note that no form of payment will be requested upfront from you in this endeavor. the following so that a customer will contact you: NAME.................................. MAILING ADDRESS...................... AGE:................................. CELL PHONE NUMBER....................... COMPANY NAME (If any)............................. OCCUPATION............................... Country.................................. DAVID ROGER From owner-xfs@oss.sgi.com Tue Nov 27 02:53:58 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 27 Nov 2007 02:54:02 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lARArrnm015191 for ; Tue, 27 Nov 2007 02:53:57 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id VAA23659; Tue, 27 Nov 2007 21:54:00 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lARArxdD122428717; Tue, 27 Nov 2007 21:54:00 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lARArwA7108645824; Tue, 27 Nov 2007 21:53:58 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Tue, 27 Nov 2007 21:53:58 +1100 From: David Chinner To: Lachlan McIlroy Cc: David Chinner , xfs-dev , xfs-oss Subject: Re: [PATCH, RFC] Delayed logging of file sizes Message-ID: <20071127105358.GG119954183@sgi.com> References: <47467B87.2000000@sgi.com> <20071125225928.GE114266761@sgi.com> <474A112D.2040006@sgi.com> <20071126011044.GG114266761@sgi.com> <474A2180.7000605@sgi.com> <20071126021515.GH114266761@sgi.com> <474A3A92.2040200@sgi.com> <20071126050300.GI114266761@sgi.com> <474B8F51.5030102@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <474B8F51.5030102@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4928/Mon Nov 26 10:10:39 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13797 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Tue, Nov 27, 2007 at 02:30:25PM +1100, Lachlan McIlroy wrote: > David Chinner wrote: > >Sorry, i wasn't particularly clear. What I mean was that i_update_core > >might disappear completely with the changes I'm making. > > > >Basically, we have three different methods of marking the inode dirty > >at the moment - on the linux inode (mark_inode_dirty[_sync]()), the > >i_update_core = 1 for unlogged changes and logged changes are tracked via > >the > >inode log item in the AIL. > > > >One top of that, we have three different methods of flushing them - one > >from the generic code for inodes dirtied by mark_inode_dirty(), one from > >xfssyncd for inodes that are only dirtied by setting i_update_core = 1 > >and the other from the xfsaild when log tail pushing. > > > >Ideally we should only have a single method for pushing out inodes. The > >first > >step to that is tracking the dirty state in a single tree (the inode radix > >trees). That means we have to hook ->dirty_inode() to catch all dirtying > >via > >mark_inode_dirty[_sync]() and mark the inodes dirty in the radix tree. > >Then we > >need to use xfs_mark_inode_dirty_sync() everywhere that we dirty the inode. > Don't we already call mark_inode_dirty[_sync]() everywhere we dirty the > inode? Maybe. Maybe not. Tell me - does xfs_ichgtime() do the right thing? [ I do know the answer to this question and there's a day of kdb tracing behind the answer. I wrote a 15 line comment to explain what was going on in one of my patches. ] > >Once we have all the dirty state in the radix trees we can now get rid of > >i_update_core and i_update_size - all they do is mark the inode dirty and > >we don't really care about the difference between them(*) - and just use > >the dirty bit in the radix tree when necessary. > If we want to check if an inode is dirty do we have to look up the dirty > bit in the tree or is there some easy way to get it from the inode? xfs_inode_clean(ip) is my preferred interface. How that is finally implemented will be determined by how this all cleans up and what performs the best. If lockless tree lookups don't cause performance problems, then there is little reason to keep redundant information around. > By consolidating the different ways of dirtying an inode we lose the ability > to know why it is dirty and what action needs to be done to undirty it. The only way to undirty an inode is to write it to disk. > For example if the inode log item has bits set then we know we have to flush > the log otherwise there is no need. With a general purpose dirty bit we No, if the log item is present and dirty (i.e. inode is in the AIL), all it means is that we need to attach a callback to the buffer (xfs_iflush_done) when dispatching the I/O to do processing of the log item on I/O completion. Whether i_update_core is set or not in this case is irrelevant - the log item state overrides that. > will > have to flush regardless. And my recent attempt to fix the log replay issue > relies on i_update_core to indicate there are unlogged changes - I don't see > how that will work with these changes. But your changes could not be implemented, either. You can't log the inode to clean it - it merely transfers the writeback from one list to another. So, the cleaner fix is to do this - change the xfs_inode_flush() just to unconditionally log the inode and don't do inode writeback *at all* from there. That will catch all cases of unlogged changes and leave inode writeback to tail-pushing or xfssyncd which can be driven by the radix tree. Basically, if we only ever write state to disk that we've logged, then we are home free. That means the only time we should update the unlogged fields - timestamps and inode size - is during a transaction commit and not during inode writeback. If we do that then i_update_core and i_update_size go away completely and the only place we need track inode dirty state in XFS is when the inodes are in the AIL list. Even better: this removes one of the three places where we do inode writeback and is a significant step towards: > >I'd even like to go as far as a two pass writeback algorithm; pass > >one only writes data, and pass two only writes inodes. The second pass > >for XFS needs to be delayed until data writeback is complete because of > >delalloc and inode size updates redirtying the inode. The current > >mechanism means we often do two inode writes for the one data write... ---- What I'm trying to say is that I don't think we can cleanly fix the problem with the current structure, so let's not waste time on it. A cleaner fix should just fall out a simpler writeback structure. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Tue Nov 27 07:28:28 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 27 Nov 2007 07:28:32 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from ogre.sisk.pl (ogre.sisk.pl [217.79.144.158]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lARFSQJJ005166 for ; Tue, 27 Nov 2007 07:28:28 -0800 Received: from localhost (localhost.localdomain [127.0.0.1]) by ogre.sisk.pl (Postfix) with ESMTP id 1A90173438; Tue, 27 Nov 2007 16:26:00 +0100 (CET) Received: from ogre.sisk.pl ([127.0.0.1]) by localhost (ogre.sisk.pl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 09545-01; Tue, 27 Nov 2007 16:25:47 +0100 (CET) Received: from [192.168.2.11] (sowa1.fuw.edu.pl [193.0.83.121]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ogre.sisk.pl (Postfix) with ESMTP id 4BD9A732AD; Tue, 27 Nov 2007 16:25:47 +0100 (CET) From: "Rafael J. Wysocki" To: Tino Keitel Subject: Re: XFS related Oops (suspend/resume related) Date: Tue, 27 Nov 2007 16:46:53 +0100 User-Agent: KMail/1.9.6 (enterprise 20070904.708012) Cc: linux-kernel@vger.kernel.org, David Chinner , xfs@oss.sgi.com References: <20071112064706.GA23595@dose.home.local> <200711262307.56742.rjw@sisk.pl> <20071127132000.GA31893@dose.home.local> In-Reply-To: <20071127132000.GA31893@dose.home.local> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200711271646.54331.rjw@sisk.pl> X-Virus-Scanned: ClamAV 0.91.2/4932/Tue Nov 27 05:14:26 2007 on oss.sgi.com X-Virus-Scanned: amavisd-new at ogre.sisk.pl using MkS_Vir for Linux X-Virus-Status: Clean X-archive-position: 13798 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: rjw@sisk.pl Precedence: bulk X-list: xfs On Tuesday, 27 of November 2007, Tino Keitel wrote: > On Mon, Nov 26, 2007 at 23:07:56 +0100, Rafael J. Wysocki wrote: > > [...] > > > On 2.6.23.1 you can test the freezer alone by doing > > > > # echo testproc > /sys/power/disk > > # echo disk > /sys/power/state > > This is suspend to RAM, not to disk. I know. :-) Nevertheless, this is how you can test the tasks freezer _without_ actually doing a suspend of any kind. Greetings, Rafael From owner-xfs@oss.sgi.com Tue Nov 27 07:33:13 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 27 Nov 2007 07:33:16 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from ogre.sisk.pl (ogre.sisk.pl [217.79.144.158]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lARFX8BI006458 for ; Tue, 27 Nov 2007 07:33:11 -0800 Received: from localhost (localhost.localdomain [127.0.0.1]) by ogre.sisk.pl (Postfix) with ESMTP id 08F627352F; Tue, 27 Nov 2007 16:30:42 +0100 (CET) Received: from ogre.sisk.pl ([127.0.0.1]) by localhost (ogre.sisk.pl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 09537-06; Tue, 27 Nov 2007 16:30:32 +0100 (CET) Received: from [192.168.2.11] (sowa1.fuw.edu.pl [193.0.83.121]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ogre.sisk.pl (Postfix) with ESMTP id 185026BAD8; Tue, 27 Nov 2007 16:30:32 +0100 (CET) From: "Rafael J. Wysocki" To: David Chinner Subject: Re: XFS related Oops (suspend/resume related) Date: Tue, 27 Nov 2007 16:51:38 +0100 User-Agent: KMail/1.9.6 (enterprise 20070904.708012) Cc: linux-kernel@vger.kernel.org, xfs@oss.sgi.com References: <20071112064706.GA23595@dose.home.local> <20071126210844.GB119954183@sgi.com> <200711262307.56742.rjw@sisk.pl> In-Reply-To: <200711262307.56742.rjw@sisk.pl> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200711271651.39180.rjw@sisk.pl> X-Virus-Scanned: ClamAV 0.91.2/4932/Tue Nov 27 05:14:26 2007 on oss.sgi.com X-Virus-Scanned: amavisd-new at ogre.sisk.pl using MkS_Vir for Linux X-Virus-Status: Clean X-archive-position: 13799 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: rjw@sisk.pl Precedence: bulk X-list: xfs On Monday, 26 of November 2007, Rafael J. Wysocki wrote: > On Monday, 26 of November 2007, David Chinner wrote: > > On Mon, Nov 26, 2007 at 02:12:10PM +0100, Tino Keitel wrote: > > > On Wed, Nov 14, 2007 at 10:04:45 +1100, David Chinner wrote: > > > > On Tue, Nov 13, 2007 at 11:51:19AM +0100, Tino Keitel wrote: > > > > > On Tue, Nov 13, 2007 at 09:27:20 +1100, David Chinner wrote: > > > > > > > > > > [...] > > > > > > > > > > > No. I'd say something got screwed up during suspend/resume. Is it > > > > > > reproducable? > > > > > > > > > > No. I often use suspend to RAM, and usually it works without such > > > > > failures. I restart squid during the resume prosecure, and the above > > > > > Oops lead to a squid in D state. > > > > > > > > Ok. Sounds like there's not much we can debug at this point. Thanks > > > > for the report, though. > > > > > > I got a similar Oops again: > > > > > > xfs_iget_core: ambiguous vns: vp/0xc00700c0, invp/0xcb5a1680 > > > > Now there's a message that I haven't seen in about 3 years. > > > > It indicates that the linux inode connected to the xfs_inode is not > > the correct one. i.e. that the linux inode cache is out of step with > > the XFS inode cache. > > > > Basically, that is not supposed to happen. I suspect that the way > > threads are frozen is resulting in an inode lookup racing with > > a reclaim. The reclaim thread gets stopped after any use threads, > > and so we could have the situation that a process blocked in lookup > > has the XFS inode reclaimed and reused before it gets unblocked. > > > > The question is why is it happening now when none of that code in > > XFS has changed? > > > > Rafael, when are threads frozen? Only when they schedule or call > > try_to_freeze()? > > Kernel threads freeze only when they call try_to_freeze(). User space tasks > freeze while executing the signals handling code. > > > Did the freezer mechanism change in 2.6.23 (this is on 2.6.23.1)? > > Yes. Kernel threads are not sent fake signals by the freezer any more. Ah, sorry, this change has been merged after 2.6.23. However, before 2.6.23 we had another important change that caused all kernel threads to have PF_NOFREEZE set by default, unless they call set_freezable() explicitly. Greetings, Rafael From owner-xfs@oss.sgi.com Tue Nov 27 11:43:11 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 27 Nov 2007 11:43:15 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from smtp116.sbc.mail.sp1.yahoo.com (smtp116.sbc.mail.sp1.yahoo.com [69.147.64.89]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lARJh7aa022425 for ; Tue, 27 Nov 2007 11:43:10 -0800 Received: (qmail 70462 invoked from network); 27 Nov 2007 19:43:15 -0000 Received: from unknown (HELO stupidest.org) (cwedgwood@sbcglobal.net@75.37.32.112 with login) by smtp116.sbc.mail.sp1.yahoo.com with SMTP; 27 Nov 2007 19:43:15 -0000 X-YMail-OSG: 2cj0w8UVM1lpaCOoCdgMFWRDQCN.Qj0HAA5zRH1ngH9bvpjVqKGCNLesGK00FR9HyQFmOwAt6A-- Received: by tuatara.stupidest.org (Postfix, from userid 10000) id 2D647284303F; Tue, 27 Nov 2007 11:43:14 -0800 (PST) Date: Tue, 27 Nov 2007 11:43:14 -0800 From: Chris Wedgwood To: Christoph Hellwig Cc: linux-xfs@oss.sgi.com, LKML Subject: Re: [PATCH] xfs: revert to double-buffering readdir Message-ID: <20071127194314.GA4939@puku.stupidest.org> References: <20071114070400.GA25708@puku.stupidest.org> <20071125163014.GA17922@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071125163014.GA17922@infradead.org> X-Virus-Scanned: ClamAV 0.91.2/4932/Tue Nov 27 05:14:26 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13800 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: cw@f00f.org Precedence: bulk X-list: xfs On Sun, Nov 25, 2007 at 04:30:14PM +0000, Christoph Hellwig wrote: > The current readdir implementation deadlocks on a btree buffers > locks because nfsd calls back into ->lookup from the filldir > callback. The only short-term fix for this is to revert to the old > inefficient double-buffering scheme. This seems to work really well here. > This patch does exactly that and reverts xfs_file_readdir to what's > basically the 2.6.23 version minus the uio and vnops junk. This should probably be submitted for inclusion stable-2.6.24. Perhaps a version with the #if 0 [...] stuff dropped? (I'm happy to send a patch for that if you prefer). From owner-xfs@oss.sgi.com Tue Nov 27 13:12:01 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 27 Nov 2007 13:12:06 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lARLBvvf008710 for ; Tue, 27 Nov 2007 13:12:00 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id IAA11466; Wed, 28 Nov 2007 08:12:03 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lARLC0dD122520008; Wed, 28 Nov 2007 08:12:02 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lARLBwrJ123297389; Wed, 28 Nov 2007 08:11:58 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Wed, 28 Nov 2007 08:11:58 +1100 From: David Chinner To: "Rafael J. Wysocki" Cc: David Chinner , linux-kernel@vger.kernel.org, xfs@oss.sgi.com Subject: Re: XFS related Oops (suspend/resume related) Message-ID: <20071127211155.GK119954183@sgi.com> References: <20071112064706.GA23595@dose.home.local> <20071126210844.GB119954183@sgi.com> <200711262307.56742.rjw@sisk.pl> <200711271651.39180.rjw@sisk.pl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200711271651.39180.rjw@sisk.pl> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4933/Tue Nov 27 11:10:57 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13801 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Tue, Nov 27, 2007 at 04:51:38PM +0100, Rafael J. Wysocki wrote: > On Monday, 26 of November 2007, Rafael J. Wysocki wrote: > > On Monday, 26 of November 2007, David Chinner wrote: > > > Now there's a message that I haven't seen in about 3 years. > > > > > > It indicates that the linux inode connected to the xfs_inode is not > > > the correct one. i.e. that the linux inode cache is out of step with > > > the XFS inode cache. > > > > > > Basically, that is not supposed to happen. I suspect that the way > > > threads are frozen is resulting in an inode lookup racing with > > > a reclaim. The reclaim thread gets stopped after any use threads, > > > and so we could have the situation that a process blocked in lookup > > > has the XFS inode reclaimed and reused before it gets unblocked. > > > > > > The question is why is it happening now when none of that code in > > > XFS has changed? > > > > > > Rafael, when are threads frozen? Only when they schedule or call > > > try_to_freeze()? > > > > Kernel threads freeze only when they call try_to_freeze(). User space tasks > > freeze while executing the signals handling code. > > > > > Did the freezer mechanism change in 2.6.23 (this is on 2.6.23.1)? > > > > Yes. Kernel threads are not sent fake signals by the freezer any more. > > Ah, sorry, this change has been merged after 2.6.23. However, before 2.6.23 > we had another important change that caused all kernel threads to have > PF_NOFREEZE set by default, unless they call set_freezable() explicitly. So try_to_freeze() will never freeze a thread if it has not been set_freezable()? And xfsbufd will never be frozen? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Tue Nov 27 13:34:55 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 27 Nov 2007 13:34:57 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: * X-Spam-Status: No, score=1.9 required=5.0 tests=AWL,BAYES_00, RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_PSBL autolearn=no version=3.3.0-r574664 Received: from ogre.sisk.pl (ogre.sisk.pl [217.79.144.158]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lARLYqev011980 for ; Tue, 27 Nov 2007 13:34:54 -0800 Received: from localhost (localhost.localdomain [127.0.0.1]) by ogre.sisk.pl (Postfix) with ESMTP id 37C676B74B; Tue, 27 Nov 2007 22:32:12 +0100 (CET) Received: from ogre.sisk.pl ([127.0.0.1]) by localhost (ogre.sisk.pl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 12710-09; Tue, 27 Nov 2007 22:31:56 +0100 (CET) Received: from [192.168.100.119] (nat-be3.aster.pl [212.76.37.200]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ogre.sisk.pl (Postfix) with ESMTP id EF0B761171; Tue, 27 Nov 2007 22:31:55 +0100 (CET) From: "Rafael J. Wysocki" To: David Chinner Subject: Re: XFS related Oops (suspend/resume related) Date: Tue, 27 Nov 2007 22:53:00 +0100 User-Agent: KMail/1.9.6 (enterprise 20070904.708012) Cc: linux-kernel@vger.kernel.org, xfs@oss.sgi.com, Tino Keitel References: <20071112064706.GA23595@dose.home.local> <200711271651.39180.rjw@sisk.pl> <20071127211155.GK119954183@sgi.com> In-Reply-To: <20071127211155.GK119954183@sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200711272253.01136.rjw@sisk.pl> X-Virus-Scanned: ClamAV 0.91.2/4933/Tue Nov 27 11:10:57 2007 on oss.sgi.com X-Virus-Scanned: amavisd-new at ogre.sisk.pl using MkS_Vir for Linux X-Virus-Status: Clean X-archive-position: 13802 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: rjw@sisk.pl Precedence: bulk X-list: xfs On Tuesday, 27 of November 2007, David Chinner wrote: > On Tue, Nov 27, 2007 at 04:51:38PM +0100, Rafael J. Wysocki wrote: > > On Monday, 26 of November 2007, Rafael J. Wysocki wrote: > > > On Monday, 26 of November 2007, David Chinner wrote: > > > > Now there's a message that I haven't seen in about 3 years. > > > > > > > > It indicates that the linux inode connected to the xfs_inode is not > > > > the correct one. i.e. that the linux inode cache is out of step with > > > > the XFS inode cache. > > > > > > > > Basically, that is not supposed to happen. I suspect that the way > > > > threads are frozen is resulting in an inode lookup racing with > > > > a reclaim. The reclaim thread gets stopped after any use threads, > > > > and so we could have the situation that a process blocked in lookup > > > > has the XFS inode reclaimed and reused before it gets unblocked. > > > > > > > > The question is why is it happening now when none of that code in > > > > XFS has changed? > > > > > > > > Rafael, when are threads frozen? Only when they schedule or call > > > > try_to_freeze()? > > > > > > Kernel threads freeze only when they call try_to_freeze(). User space tasks > > > freeze while executing the signals handling code. > > > > > > > Did the freezer mechanism change in 2.6.23 (this is on 2.6.23.1)? > > > > > > Yes. Kernel threads are not sent fake signals by the freezer any more. > > > > Ah, sorry, this change has been merged after 2.6.23. However, before 2.6.23 > > we had another important change that caused all kernel threads to have > > PF_NOFREEZE set by default, unless they call set_freezable() explicitly. > > So try_to_freeze() will never freeze a thread if it has not been > set_freezable()? And xfsbufd will never be frozen? No, it won't. I must have overlooked it, probably because it calls refrigerator() directly and not try_to_freeze() ... I think something like the appended patch will help, then. Greetings, Rafael --- Fix breakage caused by commit 831441862956fffa17b9801db37e6ea1650b0f69 that did not introduce the necessary call to set_freezable() in xfs/linux-2.6/xfs_buf.c . Signed-off-by: Rafael J. Wysocki --- fs/xfs/linux-2.6/xfs_buf.c | 2 ++ 1 file changed, 2 insertions(+) Index: linux-2.6/fs/xfs/linux-2.6/xfs_buf.c =================================================================== --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_buf.c +++ linux-2.6/fs/xfs/linux-2.6/xfs_buf.c @@ -1750,6 +1750,8 @@ xfsbufd( current->flags |= PF_MEMALLOC; + set_freezable(); + do { if (unlikely(freezing(current))) { set_bit(XBT_FORCE_SLEEP, &target->bt_flags); From owner-xfs@oss.sgi.com Tue Nov 27 13:43:13 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 27 Nov 2007 13:43:16 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=BAYES_50,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.0-r574664 Received: from av9-2-sn2.hy.skanova.net (av9-2-sn2.hy.skanova.net [81.228.8.180]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lARLhAxj013431 for ; Tue, 27 Nov 2007 13:43:12 -0800 Received: by av9-2-sn2.hy.skanova.net (Postfix, from userid 502) id CEAE238E6E; Tue, 27 Nov 2007 22:20:50 +0100 (CET) Received: from smtp4-2-sn2.hy.skanova.net (smtp4-2-sn2.hy.skanova.net [81.228.8.93]) by av9-2-sn2.hy.skanova.net (Postfix) with ESMTP id 8DA1938E6D for ; Tue, 27 Nov 2007 22:20:50 +0100 (CET) Received: from cobra.e-626.net (h193n1fls32o1110.telia.com [213.67.141.193]) by smtp4-2-sn2.hy.skanova.net (Postfix) with ESMTP id 7657137E4A for ; Tue, 27 Nov 2007 22:20:50 +0100 (CET) Received: from [192.168.1.9] (h193n1fls32o1110.telia.com [213.67.141.193]) (authenticated bits=0) by cobra.e-626.net (8.14.0/8.14.0) with ESMTP id lARLKfTr022579 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Tue, 27 Nov 2007 22:20:46 +0100 Message-ID: <474C8A05.3020604@e-626.net> Date: Tue, 27 Nov 2007 22:20:05 +0100 From: Johan Andersson User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: xfs@oss.sgi.com Subject: XFS performance problems on Linux x86_64 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4933/Tue Nov 27 11:10:57 2007 on oss.sgi.com X-Virus-Scanned: ClamAV 0.91.2/4933/Tue Nov 27 20:10:57 2007 on cobra.e-626.net X-Virus-Status: Clean X-archive-position: 13803 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: johan@e-626.net Precedence: bulk X-list: xfs Hi! I am using Gentoo Linux on XFS root filesystem on a number of machines, where some are P4 based i686, and some new are Intel Core 2 Duo based x86_64 based. When the new x86_64 based machines were put into service, we noticed that they are extremely slow on file io. I have now created two test partitions, each 5G in size, on the same disk. One is xfs and one is ext3, both filesystems created with default options. My simple test is to rsync our local portage tree to the 5G partition: ===================================================================== tmpc-masv2 xfs # time rsync -r --delete rsync://devsrv/portage portage real 5m55.037s user 0m1.291s sys 0m10.352s ====================================================================== tmpc-masv2 ext3 # time rsync -r --delete rsync://devsrv/portage portage real 0m28.943s user 0m1.095s sys 0m5.384s I have repeated this a number of times to make sure caching on the server does not interfere, with about the same results every time. Any idea why XFS appears to be 12 times slower than ext3 on the 64-bit machine? I have also some statistics from bonnie++: XFS: > Version 1.93c ------Sequential Output------ --Sequential Input- --Random- > Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP > tmpc-masv2 4G 929 99 48914 8 23036 3 1872 96 50322 4 162.0 1 > Latency 8913us 1675ms 492ms 54567us 161ms 503ms > Version 1.93c ------Sequential Create------ --------Random Create-------- > tmpc-masv2 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > 16 3241 13 +++++ +++ 3541 13 3729 14 +++++ +++ 1001 4 > Latency 60600us 80us 34066us 82412us 22us 269ms EXT3: > Version 1.93c ------Sequential Output------ --Sequential Input- --Random- > Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP > tmpc-masv2 4G 581 98 43340 9 22933 4 2435 96 50829 4 153.5 1 > Latency 56412us 2111ms 1885ms 41179us 101ms 690ms > Version 1.93c ------Sequential Create------ --------Random Create-------- > tmpc-masv2 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > 16 31286 38 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ > Latency 11233us 145us 165us 7555us 8us 40us As it looks here, xfs performs ok (but not as good as expected) on large files, but creating and deleting files is extremely slow. The machine these test run on uses Gentoo kernel sources 2.6.23-gentoo-r1 (also tested with 2.6.22-gentoo-r8). xfsprogs is 2.9.4. /Johan Andersson From owner-xfs@oss.sgi.com Tue Nov 27 14:05:39 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 27 Nov 2007 14:05:42 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_43 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lARM5ZPd016305 for ; Tue, 27 Nov 2007 14:05:37 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA13471; Wed, 28 Nov 2007 09:05:40 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lARM5cdD122877538; Wed, 28 Nov 2007 09:05:39 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lARM5aiK123544671; Wed, 28 Nov 2007 09:05:36 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Wed, 28 Nov 2007 09:05:36 +1100 From: David Chinner To: Johan Andersson Cc: xfs@oss.sgi.com Subject: Re: XFS performance problems on Linux x86_64 Message-ID: <20071127220536.GL119954183@sgi.com> References: <474C8A05.3020604@e-626.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <474C8A05.3020604@e-626.net> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4933/Tue Nov 27 11:10:57 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13804 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Tue, Nov 27, 2007 at 10:20:05PM +0100, Johan Andersson wrote: > Hi! > > I am using Gentoo Linux on XFS root filesystem on a number of machines, > where some are P4 based i686, and some new are Intel Core 2 Duo based > x86_64 based. > When the new x86_64 based machines were put into service, we noticed > that they are extremely slow on file io. I have now created two test > partitions, each 5G in size, on the same disk. One is xfs and one is > ext3, both filesystems created with default options. My simple test is > to rsync our local portage tree to the 5G partition: > ===================================================================== > tmpc-masv2 xfs # time rsync -r --delete rsync://devsrv/portage portage > > real 5m55.037s > user 0m1.291s > sys 0m10.352s > > ====================================================================== > tmpc-masv2 ext3 # time rsync -r --delete rsync://devsrv/portage portage > > real 0m28.943s > user 0m1.095s > sys 0m5.384s > > I have repeated this a number of times to make sure caching on the > server does not interfere, with about the same results every time. > > Any idea why XFS appears to be 12 times slower than ext3 on the 64-bit > machine? # mkfs.xfs -f -l lazy-count=1,version=2,size=128m -i attr=2 -d agcount=4 # mount -o logbsize=256k And if you don't care about filsystem corruption on power loss: # mount -o logbsize=256k,nobarrier Those mkfs values (except for log size) will be hte defaults in the next release of xfsprogs. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Tue Nov 27 16:10:05 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 27 Nov 2007 16:10:09 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00,J_CHICKENPOX_43, SPF_HELO_PASS autolearn=no version=3.3.0-r574664 Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAS0A1jj032427 for ; Tue, 27 Nov 2007 16:10:05 -0800 Received: from list by ciao.gmane.org with local (Exim 4.43) id 1Ix9dO-0004rY-VQ for linux-xfs@oss.sgi.com; Tue, 27 Nov 2007 23:14:30 +0000 Received: from p57b4c6bd.dip.t-dialin.net ([87.180.198.189]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 27 Nov 2007 23:14:30 +0000 Received: from bernd-schubert by p57b4c6bd.dip.t-dialin.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 27 Nov 2007 23:14:30 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: linux-xfs@oss.sgi.com From: Bernd Schubert Subject: Re: XFS performance problems on Linux x86_64 Date: Wed, 28 Nov 2007 00:13:57 +0100 Lines: 15 Message-ID: References: <474C8A05.3020604@e-626.net> <20071127220536.GL119954183@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7Bit X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: p57b4c6bd.dip.t-dialin.net User-Agent: KNode/0.10.5 X-Virus-Scanned: ClamAV 0.91.2/4933/Tue Nov 27 11:10:57 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13805 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bernd-schubert@gmx.de Precedence: bulk X-list: xfs Hello David, David Chinner wrote: > > # mkfs.xfs -f -l lazy-count=1,version=2,size=128m -i attr=2 -d agcount=4 > # mount -o logbsize=256k thanks, I was also going to ask which are optimal parameters. Just didn't have the time yet :) Any idea when these options will be default? Cheers, Bernd From owner-xfs@oss.sgi.com Tue Nov 27 16:44:25 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 27 Nov 2007 16:45:28 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAS0iKqA009435 for ; Tue, 27 Nov 2007 16:44:22 -0800 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA19478; Wed, 28 Nov 2007 11:44:23 +1100 Message-ID: <474CB9AE.9020604@sgi.com> Date: Wed, 28 Nov 2007 11:43:26 +1100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.9 (X11/20071031) MIME-Version: 1.0 To: David Chinner CC: xfs-dev , xfs-oss Subject: Re: [PATCH, RFC] Delayed logging of file sizes References: <47467B87.2000000@sgi.com> <20071125225928.GE114266761@sgi.com> <474A112D.2040006@sgi.com> <20071126011044.GG114266761@sgi.com> <474A2180.7000605@sgi.com> <20071126021515.GH114266761@sgi.com> <474A3A92.2040200@sgi.com> <20071126050300.GI114266761@sgi.com> <474B8F51.5030102@sgi.com> <20071127105358.GG119954183@sgi.com> In-Reply-To: <20071127105358.GG119954183@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4933/Tue Nov 27 11:10:57 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13806 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs David Chinner wrote: > On Tue, Nov 27, 2007 at 02:30:25PM +1100, Lachlan McIlroy wrote: >> David Chinner wrote: >>> Sorry, i wasn't particularly clear. What I mean was that i_update_core >>> might disappear completely with the changes I'm making. >>> >>> Basically, we have three different methods of marking the inode dirty >>> at the moment - on the linux inode (mark_inode_dirty[_sync]()), the >>> i_update_core = 1 for unlogged changes and logged changes are tracked via >>> the >>> inode log item in the AIL. >>> >>> One top of that, we have three different methods of flushing them - one >> >from the generic code for inodes dirtied by mark_inode_dirty(), one from >>> xfssyncd for inodes that are only dirtied by setting i_update_core = 1 >>> and the other from the xfsaild when log tail pushing. >>> >>> Ideally we should only have a single method for pushing out inodes. The >>> first >>> step to that is tracking the dirty state in a single tree (the inode radix >>> trees). That means we have to hook ->dirty_inode() to catch all dirtying >>> via >>> mark_inode_dirty[_sync]() and mark the inodes dirty in the radix tree. >>> Then we >>> need to use xfs_mark_inode_dirty_sync() everywhere that we dirty the inode. >> Don't we already call mark_inode_dirty[_sync]() everywhere we dirty the >> inode? > > Maybe. Maybe not. Tell me - does xfs_ichgtime() do the right thing? > > [ I do know the answer to this question and there's a day of kdb tracing > behind the answer. I wrote a 15 line comment to explain what was going > on in one of my patches. ] Are you referring to the !(inode->i_state & I_LOCK) check? Anyway, since you know the answer why don't you enlighten me? > >>> Once we have all the dirty state in the radix trees we can now get rid of >>> i_update_core and i_update_size - all they do is mark the inode dirty and >>> we don't really care about the difference between them(*) - and just use >>> the dirty bit in the radix tree when necessary. >> If we want to check if an inode is dirty do we have to look up the dirty >> bit in the tree or is there some easy way to get it from the inode? > > xfs_inode_clean(ip) is my preferred interface. How that is finally > implemented will be determined by how this all cleans up and what > performs the best. If lockless tree lookups don't cause performance > problems, then there is little reason to keep redundant information > around. I can't imagine that a tree lookup (lockless or not) would be faster than dereferencing fields from the inode. If keeping the inode's dirty flags and the ones in the radix tree in sync is an issue then maybe tree lookups are a performance hit we can live with. > >> By consolidating the different ways of dirtying an inode we lose the ability >> to know why it is dirty and what action needs to be done to undirty it. > > The only way to undirty an inode is to write it to disk. True. I was thinking about what may need to be done before we write it to disk such as flushing the log but that would just be dependent on whether the inode is pinned? > >> For example if the inode log item has bits set then we know we have to flush >> the log otherwise there is no need. With a general purpose dirty bit we > > No, if the log item is present and dirty (i.e. inode is in the AIL), > all it means is that we need to attach a callback to the buffer > (xfs_iflush_done) when dispatching the I/O to do processing of the > log item on I/O completion. Whether i_update_core is set or not > in this case is irrelevant - the log item state overrides that. > >> will >> have to flush regardless. And my recent attempt to fix the log replay issue >> relies on i_update_core to indicate there are unlogged changes - I don't see >> how that will work with these changes. > > But your changes could not be implemented, either. You can't log the inode > to clean it - it merely transfers the writeback from one list to > another. Could not be implemented? What was that patch I sent around then? It was implemented and it did work - it got XFSQA test 182 to finally pass. But sure it wasn't an ideal approach. I even fixed it so that it didn't dirty the inode during the transaction. > > So, the cleaner fix is to do this - change the xfs_inode_flush() > just to unconditionally log the inode and don't do inode writeback *at > all* from there. That will catch all cases of unlogged changes and leave > inode writeback to tail-pushing or xfssyncd which can be driven by > the radix tree. Huh? Aren't we trying to minimize the number of transactions we do? My changes introduce new transactions but only when we have to. You're saying here that we log the inode unconditionally - how is that better? I'm not trying to defend my changes here (I don't care how the problem gets fixed) - I'm just trying to understand why your suggestions are a good idea. I do like the way it simplifies inode writeback though - a sync would optionally log all the inodes and then just flush the log and that's it (I think). > > Basically, if we only ever write state to disk that we've logged, > then we are home free. That means the only time we should update > the unlogged fields - timestamps and inode size - is during a > transaction commit and not during inode writeback. If we do that > then i_update_core and i_update_size go away completely and the > only place we need track inode dirty state in XFS is when the > inodes are in the AIL list. > > Even better: this removes one of the three places where we do inode > writeback and is a significant step towards: > >>> I'd even like to go as far as a two pass writeback algorithm; pass >>> one only writes data, and pass two only writes inodes. The second pass >>> for XFS needs to be delayed until data writeback is complete because of >>> delalloc and inode size updates redirtying the inode. The current >>> mechanism means we often do two inode writes for the one data write... > > ---- > > What I'm trying to say is that I don't think we can cleanly fix the problem > with the current structure, so let's not waste time on it. A cleaner > fix should just fall out a simpler writeback structure. > Fair enough. I'll wait for the patches. From owner-xfs@oss.sgi.com Tue Nov 27 18:01:34 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 27 Nov 2007 18:01:37 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAS21ToJ026063 for ; Tue, 27 Nov 2007 18:01:31 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA21744; Wed, 28 Nov 2007 13:01:37 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAS21adD123427496; Wed, 28 Nov 2007 13:01:37 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAS21all123672184; Wed, 28 Nov 2007 13:01:36 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Wed, 28 Nov 2007 13:01:36 +1100 From: David Chinner To: Lachlan McIlroy Cc: David Chinner , xfs-dev , xfs-oss Subject: Re: [PATCH, RFC] Delayed logging of file sizes Message-ID: <20071128020135.GM119954183@sgi.com> References: <20071125225928.GE114266761@sgi.com> <474A112D.2040006@sgi.com> <20071126011044.GG114266761@sgi.com> <474A2180.7000605@sgi.com> <20071126021515.GH114266761@sgi.com> <474A3A92.2040200@sgi.com> <20071126050300.GI114266761@sgi.com> <474B8F51.5030102@sgi.com> <20071127105358.GG119954183@sgi.com> <474CB9AE.9020604@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <474CB9AE.9020604@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4934/Tue Nov 27 15:17:17 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13807 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Wed, Nov 28, 2007 at 11:43:26AM +1100, Lachlan McIlroy wrote: > David Chinner wrote: > >On Tue, Nov 27, 2007 at 02:30:25PM +1100, Lachlan McIlroy wrote: > >>David Chinner wrote: > >>>Sorry, i wasn't particularly clear. What I mean was that i_update_core > >>>might disappear completely with the changes I'm making. > >>> > >>>Basically, we have three different methods of marking the inode dirty > >>>at the moment - on the linux inode (mark_inode_dirty[_sync]()), the > >>>i_update_core = 1 for unlogged changes and logged changes are tracked > >>>via the > >>>inode log item in the AIL. > >>> > >>>One top of that, we have three different methods of flushing them - one > >>>from the generic code for inodes dirtied by mark_inode_dirty(), one from > >>>xfssyncd for inodes that are only dirtied by setting i_update_core = 1 > >>>and the other from the xfsaild when log tail pushing. > >>> > >>>Ideally we should only have a single method for pushing out inodes. The > >>>first > >>>step to that is tracking the dirty state in a single tree (the inode > >>>radix > >>>trees). That means we have to hook ->dirty_inode() to catch all dirtying > >>>via > >>>mark_inode_dirty[_sync]() and mark the inodes dirty in the radix tree. > >>>Then we > >>>need to use xfs_mark_inode_dirty_sync() everywhere that we dirty the > >>>inode. > >>Don't we already call mark_inode_dirty[_sync]() everywhere we dirty the > >>inode? > > > >Maybe. Maybe not. Tell me - does xfs_ichgtime() do the right thing? > > > >[ I do know the answer to this question and there's a day of kdb tracing > >behind the answer. I wrote a 15 line comment to explain what was going > >on in one of my patches. ] > > Are you referring to the !(inode->i_state & I_LOCK) check? Yup. > Anyway, since you know the answer why don't you enlighten me? When allocating a new inode, we mark the inode dirty when first setting the timestamps in xfs_dir_ialloc(). At the time this happens the inode is I_LOCK|I_NEW and hence mark_inode_dirty_sync() would just mark the inode dirty and *not* move it to the dirty list. Because unlock_new_inode() does not check the dirty state when removing the I_LOCK state, the inode is never moved to the dirty list if it is already dirty (unlike __sync_single_inode()). Further calls to mark_inode_dirty_sync() see the inode as dirty and don't move it to the dirty list, either. Hence the inode would never get flushed out by the generic code if we called mark_inode_dirty_sync() in that location. Why is it wrong? It should be checking I_NEW, not I_LOCK because all other cases where I_LOCK might be set are covered by the code that unlocks the inode. > >>>Once we have all the dirty state in the radix trees we can now get rid of > >>>i_update_core and i_update_size - all they do is mark the inode dirty and > >>>we don't really care about the difference between them(*) - and just use > >>>the dirty bit in the radix tree when necessary. > >>If we want to check if an inode is dirty do we have to look up the dirty > >>bit in the tree or is there some easy way to get it from the inode? > > > >xfs_inode_clean(ip) is my preferred interface. How that is finally > >implemented will be determined by how this all cleans up and what > >performs the best. If lockless tree lookups don't cause performance > >problems, then there is little reason to keep redundant information > >around. > I can't imagine that a tree lookup (lockless or not) would be faster > than dereferencing fields from the inode. If keeping the inode's dirty > flags and the ones in the radix tree in sync is an issue then maybe > tree lookups are a performance hit we can live with. I'm hoping to avoid this problem altogether by removing as many "is the inode dirty" checks as possible. If inode writeback is driven exclusively by the radix tree dirty bit via a traversal and we only write back logged changes, then I don't think we need to be checking if the inode is clean very often. That is, if we see the inode in xfs_flush_inode() then it is dirty at the linux level, so we log the inode. That makes the inode clean at the linux layer and dirty at the XFS level, and we know that as long as the inode remains in the AIL it is dirty. We only ever flush inodes based on a AIL push (which doesn't require dirty bits) or via the syncd, which looks up dirty inodes via the radix tree tag, and hence most of the dirty checks on the inode can go away because we don't need to check it during writeback now. > >>By consolidating the different ways of dirtying an inode we lose the > >>ability > >>to know why it is dirty and what action needs to be done to undirty it. > > > >The only way to undirty an inode is to write it to disk. > True. I was thinking about what may need to be done before we write it > to disk such as flushing the log but that would just be dependent on > whether the inode is pinned? Right, flushing the log is only needed if it is pinned. > >>For example if the inode log item has bits set then we know we have to > >>flush > >>the log otherwise there is no need. With a general purpose dirty bit we > > > >No, if the log item is present and dirty (i.e. inode is in the AIL), > >all it means is that we need to attach a callback to the buffer > >(xfs_iflush_done) when dispatching the I/O to do processing of the > >log item on I/O completion. Whether i_update_core is set or not > >in this case is irrelevant - the log item state overrides that. > > > >>will > >>have to flush regardless. And my recent attempt to fix the log replay > >>issue > >>relies on i_update_core to indicate there are unlogged changes - I don't > >>see > >>how that will work with these changes. > > > >But your changes could not be implemented, either. You can't log the inode > >to clean it - it merely transfers the writeback from one list to > >another. > Could not be implemented? What was that patch I sent around then? Sorry, I missed an important work there - could not be implemented _efficiently_. Basically, you are logging the inode, then call xfs_iflush, which immediately sees it pinned and forces the log. That's an extra transaction *and* log I/O for every inode we write. That defeats all inode clustering and and will seriously harm performance. Also, the change fails to log changes to inodes in the same cluster that get written out because they are dirty. > >So, the cleaner fix is to do this - change the xfs_inode_flush() > >just to unconditionally log the inode and don't do inode writeback *at > >all* from there. That will catch all cases of unlogged changes and leave > >inode writeback to tail-pushing or xfssyncd which can be driven by > >the radix tree. > Huh? Aren't we trying to minimize the number of transactions we do? My > changes introduce new transactions but only when we have to. You're saying > here that we log the inode unconditionally - how is that better? I'm not > trying to defend my changes here (I don't care how the problem gets fixed) > - I'm just trying to understand why your suggestions are a good idea. Because we can log entire inode cluster's worth of changes in a single transaction. One transaction vs one I/O - it's a decent tradeoff to avoid this problem, esp. as we'll get improved inode writeback clustering if we flush from the radix tree (i.e. clusters get flushed in ascending inode number order)..... > I do like the way it simplifies inode writeback though - a sync would > optionally log all the inodes and then just flush the log and that's it > (I think). Yup, pretty much. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Tue Nov 27 18:31:17 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 27 Nov 2007 18:31:22 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_23, T_STOX_BOUND_090909_B autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAS2VDSQ031238 for ; Tue, 27 Nov 2007 18:31:15 -0800 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA22412; Wed, 28 Nov 2007 13:31:15 +1100 Message-ID: <474CD2BA.8070204@sgi.com> Date: Wed, 28 Nov 2007 13:30:18 +1100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.9 (X11/20071031) MIME-Version: 1.0 To: Christoph Hellwig CC: xfs@oss.sgi.com, xfs-dev Subject: Re: [PATCH] kill superflous buffer locking References: <20070924184926.GA20661@lst.de> <4716DD79.6040309@sgi.com> In-Reply-To: <4716DD79.6040309@sgi.com> Content-Type: multipart/mixed; boundary="------------030702060208020708020404" X-Virus-Scanned: ClamAV 0.91.2/4934/Tue Nov 27 15:17:17 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13808 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs This is a multi-part message in MIME format. --------------030702060208020708020404 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Christoph, We've fixed the source of the assertion (that was the bugs in xfs_buf_associate_memory()) so I'm pushing your buffer lock removal patch back in again. While looking through it I found a couple of issues: - It called unlock_page() before calls to PagePrivate() and PageUptodate(). I think the page needs to be locked during these calls so I moved the unlock_page() further down. - Unlocking the pages as we go can cause a double unlock in the error handling for a NULL page in the XBF_READ_AHEAD case so I removed the unlocking code for that case. Would you mind checking these changes? Lachlan Lachlan McIlroy wrote: > Christoph, > > We've had to reverse this change because it's caused a regression. > We haven't been able to identify why we see the following assertion > trigger with these changes but the assertion goes away without the > changes. Until we figure out why we'll have to leave the buffer > locking in. > > <5>XFS mounting filesystem hdb2 > <5>Starting XFS recovery on filesystem: hdb2 (logdev: internal) > <4>XFS: xlog_recover_process_data: bad clientid > <4>Assertion failed: 0, file: fs/xfs/xfs_log_recover.c, line: 2912 > <0>------------[ cut here ]------------ > <2>kernel BUG at fs/xfs/support/debug.c:81! > <0>invalid opcode: 0000 [#1] > <0>SMP > <4>Modules linked in: > <0>CPU: 2 > <0>EIP: 0060:[] Not tainted VLI > <0>EFLAGS: 00010286 (2.6.23-kali-26_xfs-debug #1) > <0>EIP is at assfail+0x1e/0x22 > <0>eax: 00000043 ebx: f3002a50 ecx: 00000001 edx: 00000086 > <0>esi: f56e2300 edi: f8fa5c28 ebp: efa67ae4 esp: efa67ad4 > <0>ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 > <0>Process mount (pid: 15191, ti=efa66000 task=f7b43570 task.ti=efa66000) > <0>Stack: c05c8bda c05c6762 c05c4750 00000b60 efa67b1c c0269a35 00000004 > c05c5903 > <0> f3a14000 efa67ba8 f7e458c0 f8fa5c34 efa67bb8 f8fa6a38 0000000d > 00001e00 > <0> f8fa4000 00000000 efa67bf4 c026a566 f8fa4000 00000001 00000651 > 00000000 > <0>Call Trace: > <0> [] show_trace_log_lvl+0x1a/0x2f > <0> [] show_stack_log_lvl+0x9b/0xa3 > <0> [] show_registers+0x1b9/0x28b > <0> [] die+0x119/0x27b > <0> [] do_trap+0x8a/0xa3 > <0> [] do_invalid_op+0x88/0x92 > <0> [] error_code+0x72/0x78 > <0> [] xlog_recover_process_data+0x6a/0x1ff > <0> [] xlog_do_recovery_pass+0x810/0x9f3 > <0> [] xlog_do_log_recovery+0x62/0xe2 > <0> [] xlog_do_recover+0x1d/0x187 > <0> [] xlog_recover+0x88/0x95 > <0> [] xfs_log_mount+0x100/0x144 > <0> [] xfs_mountfs+0x278/0x639 > <0> [] xfs_mount+0x25c/0x2f7 > <0> [] xfs_fs_fill_super+0xab/0x1fd > <0> [] get_sb_bdev+0xd6/0x114 > <0> [] xfs_fs_get_sb+0x21/0x27 > <0> [] vfs_kern_mount+0x41/0x7a > <0> [] do_kern_mount+0x37/0xbd > <0> [] do_mount+0x566/0x5c0 > <0> [] sys_mount+0x6f/0xa9 > <0> [] sysenter_past_esp+0x5f/0x85 > <0> ======================= > <0>Code: 04 24 10 00 00 00 e8 2a e7 03 00 c9 c3 55 89 e5 83 ec 10 89 4c > 24 0c 89 54 24 08 89 44 24 04 c7 04 24 da 8b 5c c0 e8 07 bf e9 ff <0f> > 0b eb fe 55 83 e0 07 89 e5 57 bf 07 00 00 00 56 89 d6 53 89 > <0>EIP: [] assfail+0x1e/0x22 SS:ESP 0068:efa67ad4 > > Lachlan > > Christoph Hellwig wrote: >> There is no need to lock any page in xfs_buf.c because we operate >> on our own address_space and all locking is covered by the buffer >> semaphore. If we ever switch back to main blockdeive address_space >> as suggested e.g. for fsblock with a similar scheme the locking will >> have to be totally revised anyway because the current scheme is >> neither correct nor coherent with itself. >> >> >> Signed-off-by: Christoph Hellwig >> >> Index: linux-2.6-xfs/fs/xfs/linux-2.6/xfs_buf.c >> =================================================================== >> --- linux-2.6-xfs.orig/fs/xfs/linux-2.6/xfs_buf.c 2007-09-23 >> 13:28:00.000000000 +0200 >> +++ linux-2.6-xfs/fs/xfs/linux-2.6/xfs_buf.c 2007-09-23 >> 14:13:43.000000000 +0200 >> @@ -396,6 +396,7 @@ _xfs_buf_lookup_pages( >> congestion_wait(WRITE, HZ/50); >> goto retry; >> } >> + unlock_page(page); >> >> XFS_STATS_INC(xb_page_found); >> >> @@ -405,10 +406,7 @@ _xfs_buf_lookup_pages( >> ASSERT(!PagePrivate(page)); >> if (!PageUptodate(page)) { >> page_count--; >> - if (blocksize >= PAGE_CACHE_SIZE) { >> - if (flags & XBF_READ) >> - bp->b_locked = 1; >> - } else if (!PagePrivate(page)) { >> + if (blocksize < PAGE_CACHE_SIZE && !PagePrivate(page)) { >> if (test_page_region(page, offset, nbytes)) >> page_count++; >> } >> @@ -418,11 +416,6 @@ _xfs_buf_lookup_pages( >> offset = 0; >> } >> >> - if (!bp->b_locked) { >> - for (i = 0; i < bp->b_page_count; i++) >> - unlock_page(bp->b_pages[i]); >> - } >> - >> if (page_count == bp->b_page_count) >> bp->b_flags |= XBF_DONE; >> >> @@ -747,7 +740,6 @@ xfs_buf_associate_memory( >> bp->b_page_count = ++i; >> ptr += PAGE_CACHE_SIZE; >> } >> - bp->b_locked = 0; >> >> bp->b_count_desired = bp->b_buffer_length = len; >> bp->b_flags |= XBF_MAPPED; >> @@ -1093,25 +1085,13 @@ xfs_buf_iostart( >> return status; >> } >> >> -STATIC_INLINE int >> -_xfs_buf_iolocked( >> - xfs_buf_t *bp) >> -{ >> - ASSERT(bp->b_flags & (XBF_READ | XBF_WRITE)); >> - if (bp->b_flags & XBF_READ) >> - return bp->b_locked; >> - return 0; >> -} >> - >> STATIC_INLINE void >> _xfs_buf_ioend( >> xfs_buf_t *bp, >> int schedule) >> { >> - if (atomic_dec_and_test(&bp->b_io_remaining) == 1) { >> - bp->b_locked = 0; >> + if (atomic_dec_and_test(&bp->b_io_remaining) == 1) >> xfs_buf_ioend(bp, schedule); >> - } >> } >> >> STATIC int >> @@ -1146,10 +1126,6 @@ xfs_buf_bio_end_io( >> >> if (--bvec >= bio->bi_io_vec) >> prefetchw(&bvec->bv_page->flags); >> - >> - if (_xfs_buf_iolocked(bp)) { >> - unlock_page(page); >> - } >> } while (bvec >= bio->bi_io_vec); >> >> _xfs_buf_ioend(bp, 1); >> @@ -1161,13 +1137,12 @@ STATIC void >> _xfs_buf_ioapply( >> xfs_buf_t *bp) >> { >> - int i, rw, map_i, total_nr_pages, nr_pages; >> + int rw, map_i, total_nr_pages, nr_pages; >> struct bio *bio; >> int offset = bp->b_offset; >> int size = bp->b_count_desired; >> sector_t sector = bp->b_bn; >> unsigned int blocksize = bp->b_target->bt_bsize; >> - int locking = _xfs_buf_iolocked(bp); >> >> total_nr_pages = bp->b_page_count; >> map_i = 0; >> @@ -1190,7 +1165,7 @@ _xfs_buf_ioapply( >> * filesystem block size is not smaller than the page size. >> */ >> if ((bp->b_buffer_length < PAGE_CACHE_SIZE) && >> - (bp->b_flags & XBF_READ) && locking && >> + (bp->b_flags & XBF_READ) && >> (blocksize >= PAGE_CACHE_SIZE)) { >> bio = bio_alloc(GFP_NOIO, 1); >> >> @@ -1207,24 +1182,6 @@ _xfs_buf_ioapply( >> goto submit_io; >> } >> >> - /* Lock down the pages which we need to for the request */ >> - if (locking && (bp->b_flags & XBF_WRITE) && (bp->b_locked == 0)) { >> - for (i = 0; size; i++) { >> - int nbytes = PAGE_CACHE_SIZE - offset; >> - struct page *page = bp->b_pages[i]; >> - >> - if (nbytes > size) >> - nbytes = size; >> - >> - lock_page(page); >> - >> - size -= nbytes; >> - offset = 0; >> - } >> - offset = bp->b_offset; >> - size = bp->b_count_desired; >> - } >> - >> next_chunk: >> atomic_inc(&bp->b_io_remaining); >> nr_pages = BIO_MAX_SECTORS >> (PAGE_SHIFT - BBSHIFT); >> Index: linux-2.6-xfs/fs/xfs/linux-2.6/xfs_buf.h >> =================================================================== >> --- linux-2.6-xfs.orig/fs/xfs/linux-2.6/xfs_buf.h 2007-09-05 >> 11:17:42.000000000 +0200 >> +++ linux-2.6-xfs/fs/xfs/linux-2.6/xfs_buf.h 2007-09-23 >> 14:04:36.000000000 +0200 >> @@ -143,7 +143,6 @@ typedef struct xfs_buf { >> void *b_fspriv2; >> void *b_fspriv3; >> unsigned short b_error; /* error code on I/O */ >> - unsigned short b_locked; /* page array is locked */ >> unsigned int b_page_count; /* size of page array */ >> unsigned int b_offset; /* page offset in first page */ >> struct page **b_pages; /* array of page pointers */ >> Index: linux-2.6-xfs/fs/xfs/xfsidbg.c >> =================================================================== >> --- linux-2.6-xfs.orig/fs/xfs/xfsidbg.c 2007-09-23 >> 13:33:07.000000000 +0200 >> +++ linux-2.6-xfs/fs/xfs/xfsidbg.c 2007-09-23 14:04:36.000000000 +0200 >> @@ -2110,9 +2110,9 @@ print_xfs_buf( >> (unsigned long long) bp->b_file_offset, >> (unsigned long long) bp->b_buffer_length, >> bp->b_addr); >> - kdb_printf(" b_bn 0x%llx b_count_desired 0x%lx b_locked %d\n", >> + kdb_printf(" b_bn 0x%llx b_count_desired 0x%lxn", >> (unsigned long long)bp->b_bn, >> - (unsigned long) bp->b_count_desired, (int)bp->b_locked); >> + (unsigned long) bp->b_count_desired); >> kdb_printf(" b_queuetime %ld (now=%ld/age=%ld) b_io_remaining >> %d\n", >> bp->b_queuetime, jiffies, bp->b_queuetime + age, >> bp->b_io_remaining.counter); >> >> >> > > > --------------030702060208020708020404 Content-Type: text/x-patch; name="buflock.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="buflock.diff" --- fs/xfs/linux-2.6/xfs_buf.c_1.249 2007-11-27 15:28:34.000000000 +1100 +++ fs/xfs/linux-2.6/xfs_buf.c 2007-11-28 13:02:38.000000000 +1100 @@ -387,8 +387,6 @@ _xfs_buf_lookup_pages( if (unlikely(page == NULL)) { if (flags & XBF_READ_AHEAD) { bp->b_page_count = i; - for (i = 0; i < bp->b_page_count; i++) - unlock_page(bp->b_pages[i]); return -ENOMEM; } @@ -418,24 +416,17 @@ _xfs_buf_lookup_pages( ASSERT(!PagePrivate(page)); if (!PageUptodate(page)) { page_count--; - if (blocksize >= PAGE_CACHE_SIZE) { - if (flags & XBF_READ) - bp->b_locked = 1; - } else if (!PagePrivate(page)) { + if (blocksize < PAGE_CACHE_SIZE && !PagePrivate(page)) { if (test_page_region(page, offset, nbytes)) page_count++; } } + unlock_page(page); bp->b_pages[i] = page; offset = 0; } - if (!bp->b_locked) { - for (i = 0; i < bp->b_page_count; i++) - unlock_page(bp->b_pages[i]); - } - if (page_count == bp->b_page_count) bp->b_flags |= XBF_DONE; @@ -752,7 +743,6 @@ xfs_buf_associate_memory( bp->b_pages[i] = mem_to_page((void *)pageaddr); pageaddr += PAGE_CACHE_SIZE; } - bp->b_locked = 0; bp->b_count_desired = len; bp->b_buffer_length = buflen; @@ -1099,25 +1089,13 @@ xfs_buf_iostart( return status; } -STATIC_INLINE int -_xfs_buf_iolocked( - xfs_buf_t *bp) -{ - ASSERT(bp->b_flags & (XBF_READ | XBF_WRITE)); - if (bp->b_flags & XBF_READ) - return bp->b_locked; - return 0; -} - STATIC_INLINE void _xfs_buf_ioend( xfs_buf_t *bp, int schedule) { - if (atomic_dec_and_test(&bp->b_io_remaining) == 1) { - bp->b_locked = 0; + if (atomic_dec_and_test(&bp->b_io_remaining) == 1) xfs_buf_ioend(bp, schedule); - } } STATIC int @@ -1152,10 +1130,6 @@ xfs_buf_bio_end_io( if (--bvec >= bio->bi_io_vec) prefetchw(&bvec->bv_page->flags); - - if (_xfs_buf_iolocked(bp)) { - unlock_page(page); - } } while (bvec >= bio->bi_io_vec); _xfs_buf_ioend(bp, 1); @@ -1167,13 +1141,12 @@ STATIC void _xfs_buf_ioapply( xfs_buf_t *bp) { - int i, rw, map_i, total_nr_pages, nr_pages; + int rw, map_i, total_nr_pages, nr_pages; struct bio *bio; int offset = bp->b_offset; int size = bp->b_count_desired; sector_t sector = bp->b_bn; unsigned int blocksize = bp->b_target->bt_bsize; - int locking = _xfs_buf_iolocked(bp); total_nr_pages = bp->b_page_count; map_i = 0; @@ -1196,7 +1169,7 @@ _xfs_buf_ioapply( * filesystem block size is not smaller than the page size. */ if ((bp->b_buffer_length < PAGE_CACHE_SIZE) && - (bp->b_flags & XBF_READ) && locking && + (bp->b_flags & XBF_READ) && (blocksize >= PAGE_CACHE_SIZE)) { bio = bio_alloc(GFP_NOIO, 1); @@ -1213,24 +1186,6 @@ _xfs_buf_ioapply( goto submit_io; } - /* Lock down the pages which we need to for the request */ - if (locking && (bp->b_flags & XBF_WRITE) && (bp->b_locked == 0)) { - for (i = 0; size; i++) { - int nbytes = PAGE_CACHE_SIZE - offset; - struct page *page = bp->b_pages[i]; - - if (nbytes > size) - nbytes = size; - - lock_page(page); - - size -= nbytes; - offset = 0; - } - offset = bp->b_offset; - size = bp->b_count_desired; - } - next_chunk: atomic_inc(&bp->b_io_remaining); nr_pages = BIO_MAX_SECTORS >> (PAGE_SHIFT - BBSHIFT); --- fs/xfs/linux-2.6/xfs_buf.h_1.122 2007-11-27 15:28:36.000000000 +1100 +++ fs/xfs/linux-2.6/xfs_buf.h 2007-11-27 15:21:51.000000000 +1100 @@ -143,7 +143,6 @@ typedef struct xfs_buf { void *b_fspriv2; void *b_fspriv3; unsigned short b_error; /* error code on I/O */ - unsigned short b_locked; /* page array is locked */ unsigned int b_page_count; /* size of page array */ unsigned int b_offset; /* page offset in first page */ struct page **b_pages; /* array of page pointers */ --- fs/xfs/xfsidbg.c_1.340 2007-11-27 15:28:37.000000000 +1100 +++ fs/xfs/xfsidbg.c 2007-11-27 15:28:33.000000000 +1100 @@ -2007,9 +2007,9 @@ print_xfs_buf( (unsigned long long) bp->b_file_offset, (unsigned long long) bp->b_buffer_length, bp->b_addr); - kdb_printf(" b_bn 0x%llx b_count_desired 0x%lx b_locked %d\n", + kdb_printf(" b_bn 0x%llx b_count_desired 0x%lx\n", (unsigned long long)bp->b_bn, - (unsigned long) bp->b_count_desired, (int)bp->b_locked); + (unsigned long) bp->b_count_desired); kdb_printf(" b_queuetime %ld (now=%ld/age=%ld) b_io_remaining %d\n", bp->b_queuetime, jiffies, bp->b_queuetime + age, bp->b_io_remaining.counter); --------------030702060208020708020404-- From owner-xfs@oss.sgi.com Tue Nov 27 20:19:54 2007 Received: with ECARTIS (v1.0.0; list xfs); Tue, 27 Nov 2007 20:19:58 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAS4Jo7k014382 for ; Tue, 27 Nov 2007 20:19:51 -0800 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA24886; Wed, 28 Nov 2007 15:19:50 +1100 Message-ID: <474CEC2E.8000206@sgi.com> Date: Wed, 28 Nov 2007 15:18:54 +1100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.9 (X11/20071031) MIME-Version: 1.0 To: David Chinner CC: xfs-dev , xfs-oss Subject: Re: [PATCH, RFC] Delayed logging of file sizes References: <20071125225928.GE114266761@sgi.com> <474A112D.2040006@sgi.com> <20071126011044.GG114266761@sgi.com> <474A2180.7000605@sgi.com> <20071126021515.GH114266761@sgi.com> <474A3A92.2040200@sgi.com> <20071126050300.GI114266761@sgi.com> <474B8F51.5030102@sgi.com> <20071127105358.GG119954183@sgi.com> <474CB9AE.9020604@sgi.com> <20071128020135.GM119954183@sgi.com> In-Reply-To: <20071128020135.GM119954183@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4934/Tue Nov 27 15:17:17 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13809 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs David Chinner wrote: > On Wed, Nov 28, 2007 at 11:43:26AM +1100, Lachlan McIlroy wrote: >> David Chinner wrote: >>> On Tue, Nov 27, 2007 at 02:30:25PM +1100, Lachlan McIlroy wrote: >>>> David Chinner wrote: >>>>> Sorry, i wasn't particularly clear. What I mean was that i_update_core >>>>> might disappear completely with the changes I'm making. >>>>> >>>>> Basically, we have three different methods of marking the inode dirty >>>>> at the moment - on the linux inode (mark_inode_dirty[_sync]()), the >>>>> i_update_core = 1 for unlogged changes and logged changes are tracked >>>>> via the >>>>> inode log item in the AIL. >>>>> >>>>> One top of that, we have three different methods of flushing them - one >>>> >from the generic code for inodes dirtied by mark_inode_dirty(), one from >>>>> xfssyncd for inodes that are only dirtied by setting i_update_core = 1 >>>>> and the other from the xfsaild when log tail pushing. >>>>> >>>>> Ideally we should only have a single method for pushing out inodes. The >>>>> first >>>>> step to that is tracking the dirty state in a single tree (the inode >>>>> radix >>>>> trees). That means we have to hook ->dirty_inode() to catch all dirtying >>>>> via >>>>> mark_inode_dirty[_sync]() and mark the inodes dirty in the radix tree. >>>>> Then we >>>>> need to use xfs_mark_inode_dirty_sync() everywhere that we dirty the >>>>> inode. >>>> Don't we already call mark_inode_dirty[_sync]() everywhere we dirty the >>>> inode? >>> Maybe. Maybe not. Tell me - does xfs_ichgtime() do the right thing? >>> >>> [ I do know the answer to this question and there's a day of kdb tracing >>> behind the answer. I wrote a 15 line comment to explain what was going >>> on in one of my patches. ] >> Are you referring to the !(inode->i_state & I_LOCK) check? > > Yup. I've never liked that check, can we just get rid of it? > >> Anyway, since you know the answer why don't you enlighten me? > > When allocating a new inode, we mark the inode dirty when first > setting the timestamps in xfs_dir_ialloc(). At the time this happens > the inode is I_LOCK|I_NEW and hence mark_inode_dirty_sync() would just > mark the inode dirty and *not* move it to the dirty list. > > Because unlock_new_inode() does not check the dirty state when > removing the I_LOCK state, the inode is never moved to the dirty list > if it is already dirty (unlike __sync_single_inode()). > > Further calls to mark_inode_dirty_sync() see the inode as dirty and > don't move it to the dirty list, either. Hence the inode would never > get flushed out by the generic code if we called > mark_inode_dirty_sync() in that location. > > Why is it wrong? It should be checking I_NEW, not I_LOCK because all > other cases where I_LOCK might be set are covered by the code that > unlocks the inode. > >>>>> Once we have all the dirty state in the radix trees we can now get rid of >>>>> i_update_core and i_update_size - all they do is mark the inode dirty and >>>>> we don't really care about the difference between them(*) - and just use >>>>> the dirty bit in the radix tree when necessary. >>>> If we want to check if an inode is dirty do we have to look up the dirty >>>> bit in the tree or is there some easy way to get it from the inode? >>> xfs_inode_clean(ip) is my preferred interface. How that is finally >>> implemented will be determined by how this all cleans up and what >>> performs the best. If lockless tree lookups don't cause performance >>> problems, then there is little reason to keep redundant information >>> around. >> I can't imagine that a tree lookup (lockless or not) would be faster >> than dereferencing fields from the inode. If keeping the inode's dirty >> flags and the ones in the radix tree in sync is an issue then maybe >> tree lookups are a performance hit we can live with. > > I'm hoping to avoid this problem altogether by removing as many > "is the inode dirty" checks as possible. If inode writeback is > driven exclusively by the radix tree dirty bit via a traversal > and we only write back logged changes, then I don't think we need > to be checking if the inode is clean very often. > > That is, if we see the inode in xfs_flush_inode() then it is > dirty at the linux level, so we log the inode. That makes the > inode clean at the linux layer and dirty at the XFS level, and > we know that as long as the inode remains in the AIL it is dirty. > > We only ever flush inodes based on a AIL push (which doesn't > require dirty bits) or via the syncd, which looks up dirty > inodes via the radix tree tag, and hence most of the dirty > checks on the inode can go away because we don't need to > check it during writeback now. > >>>> By consolidating the different ways of dirtying an inode we lose the >>>> ability >>>> to know why it is dirty and what action needs to be done to undirty it. >>> The only way to undirty an inode is to write it to disk. >> True. I was thinking about what may need to be done before we write it >> to disk such as flushing the log but that would just be dependent on >> whether the inode is pinned? > > Right, flushing the log is only needed if it is pinned. > >>>> For example if the inode log item has bits set then we know we have to >>>> flush >>>> the log otherwise there is no need. With a general purpose dirty bit we >>> No, if the log item is present and dirty (i.e. inode is in the AIL), >>> all it means is that we need to attach a callback to the buffer >>> (xfs_iflush_done) when dispatching the I/O to do processing of the >>> log item on I/O completion. Whether i_update_core is set or not >>> in this case is irrelevant - the log item state overrides that. >>> >>>> will >>>> have to flush regardless. And my recent attempt to fix the log replay >>>> issue >>>> relies on i_update_core to indicate there are unlogged changes - I don't >>>> see >>>> how that will work with these changes. >>> But your changes could not be implemented, either. You can't log the inode >>> to clean it - it merely transfers the writeback from one list to >>> another. >> Could not be implemented? What was that patch I sent around then? > > Sorry, I missed an important work there - could not be implemented > _efficiently_. > > Basically, you are logging the inode, then call xfs_iflush, which > immediately sees it pinned and forces the log. That's an extra > transaction *and* log I/O for every inode we write. That defeats all > inode clustering and and will seriously harm performance. I didn't see another way around it. We only need to force the log for pinned inodes if it is a sync writeback, otherwise we can just try again later. > > Also, the change fails to log changes to inodes in the same > cluster that get written out because they are dirty. That's where it all sort of falls apart. I didn't want to log the inode in xfs_iflush_int() because we have the flush lock held and I was pretty sure logging a transaction with the flush lock held would be a bad idea. That's why I specifically removed the code that resets i_update_core in xfs_iflush_int() - so that other inodes in the same cluster will still be flagged as having unlogged changes even after the inodes have been synced to disk. But as I said it was an idea that needed some polishing. > >>> So, the cleaner fix is to do this - change the xfs_inode_flush() >>> just to unconditionally log the inode and don't do inode writeback *at >>> all* from there. That will catch all cases of unlogged changes and leave >>> inode writeback to tail-pushing or xfssyncd which can be driven by >>> the radix tree. >> Huh? Aren't we trying to minimize the number of transactions we do? My >> changes introduce new transactions but only when we have to. You're saying >> here that we log the inode unconditionally - how is that better? I'm not >> trying to defend my changes here (I don't care how the problem gets fixed) >> - I'm just trying to understand why your suggestions are a good idea. > > Because we can log entire inode cluster's worth of changes in a single > transaction. One transaction vs one I/O - it's a decent tradeoff to > avoid this problem, esp. as we'll get improved inode writeback clustering > if we flush from the radix tree (i.e. clusters get flushed in ascending > inode number order)..... That should help a lot but it will use even more space in the log - quite a lot more if just one inode in the cluster needs to be logged. Do you plan to do this in the write_inode path? If so we'll have inodes that have been logged (with a previous cluster) that still have I_DIRTY set. When these inodes go through the write_inode path we'll need to skip the transaction. > >> I do like the way it simplifies inode writeback though - a sync would >> optionally log all the inodes and then just flush the log and that's it >> (I think). > > Yup, pretty much. > > Cheers, > > Dave. From owner-xfs@oss.sgi.com Wed Nov 28 01:07:35 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 28 Nov 2007 01:07:38 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAS97WLV025272 for ; Wed, 28 Nov 2007 01:07:33 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id UAA01716; Wed, 28 Nov 2007 20:07:39 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAS97ddD123310187; Wed, 28 Nov 2007 20:07:39 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAS97bYn123065681; Wed, 28 Nov 2007 20:07:37 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Wed, 28 Nov 2007 20:07:37 +1100 From: David Chinner To: Lachlan McIlroy Cc: David Chinner , xfs-dev , xfs-oss Subject: Re: [PATCH, RFC] Delayed logging of file sizes Message-ID: <20071128090737.GO119954183@sgi.com> References: <20071126011044.GG114266761@sgi.com> <474A2180.7000605@sgi.com> <20071126021515.GH114266761@sgi.com> <474A3A92.2040200@sgi.com> <20071126050300.GI114266761@sgi.com> <474B8F51.5030102@sgi.com> <20071127105358.GG119954183@sgi.com> <474CB9AE.9020604@sgi.com> <20071128020135.GM119954183@sgi.com> <474CEC2E.8000206@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <474CEC2E.8000206@sgi.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4934/Tue Nov 27 15:17:17 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13810 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Wed, Nov 28, 2007 at 03:18:54PM +1100, Lachlan McIlroy wrote: > David Chinner wrote: > >>>[ I do know the answer to this question and there's a day of kdb tracing > >>>behind the answer. I wrote a 15 line comment to explain what was going > >>>on in one of my patches. ] > >>Are you referring to the !(inode->i_state & I_LOCK) check? > > > >Yup. > I've never liked that check, can we just get rid of it? No - removing the check is what lead me to understand why it was necessary. > >Sorry, I missed an important work there - could not be implemented > >_efficiently_. > > > >Basically, you are logging the inode, then call xfs_iflush, which > >immediately sees it pinned and forces the log. That's an extra > >transaction *and* log I/O for every inode we write. That defeats all > >inode clustering and and will seriously harm performance. > I didn't see another way around it. We only need to force the log for > pinned inodes if it is a sync writeback, otherwise we can just try again > later. But we don't do that right now - we call xfs_ipinwait() in xfs_iflush() which forces the log. > >Also, the change fails to log changes to inodes in the same > >cluster that get written out because they are dirty. > That's where it all sort of falls apart. I didn't want to log the inode > in xfs_iflush_int() because we have the flush lock held and I was pretty > sure logging a transaction with the flush lock held would be a bad idea. > That's why I specifically removed the code that resets i_update_core in > xfs_iflush_int() - so that other inodes in the same cluster will still be > flagged as having unlogged changes even after the inodes have been synced > to disk. But as I said it was an idea that needed some polishing. It's messy, and if we are logging changes then we should never write to disk unlogged changes.... > >>>So, the cleaner fix is to do this - change the xfs_inode_flush() > >>>just to unconditionally log the inode and don't do inode writeback *at > >>>all* from there. That will catch all cases of unlogged changes and leave > >>>inode writeback to tail-pushing or xfssyncd which can be driven by > >>>the radix tree. > >>Huh? Aren't we trying to minimize the number of transactions we do? My > >>changes introduce new transactions but only when we have to. You're > >>saying > >>here that we log the inode unconditionally - how is that better? I'm not > >>trying to defend my changes here (I don't care how the problem gets fixed) > >>- I'm just trying to understand why your suggestions are a good idea. > > > >Because we can log entire inode cluster's worth of changes in a single > >transaction. One transaction vs one I/O - it's a decent tradeoff to > >avoid this problem, esp. as we'll get improved inode writeback clustering > >if we flush from the radix tree (i.e. clusters get flushed in ascending > >inode number order)..... > That should help a lot but it will use even more space in the log - quite a > lot more if just one inode in the cluster needs to be logged. 32 inodes * 100 bytes for the inode core - 3k per inode is we log the entire cluster. But we know what is dirty and what isn't, so that's worst case. i.e. we only log those that are dirty. It's not log space I'm worried about here - it's the transaction overhead.... Still, doing it as a cluster is probably premature optimisation. Lets see what logging only the dirty inodes gets us and go from there. > Do you plan > to do this in the write_inode path? If so we'll have inodes that have been > logged (with a previous cluster) that still have I_DIRTY set. I'll cross that one if we need to - we can probably just clear it if the inode is not otherwise dirty... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Wed Nov 28 01:48:02 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 28 Nov 2007 01:48:10 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from verein.lst.de (verein.lst.de [213.95.11.210]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAS9lxoP029449 for ; Wed, 28 Nov 2007 01:48:01 -0800 Received: from verein.lst.de (localhost [127.0.0.1]) by verein.lst.de (8.12.3/8.12.3/Debian-7.1) with ESMTP id lAS9m4F3008264 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Wed, 28 Nov 2007 10:48:04 +0100 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.6) id lAS9m2r1008262; Wed, 28 Nov 2007 10:48:02 +0100 Date: Wed, 28 Nov 2007 10:48:02 +0100 From: Christoph Hellwig To: Lachlan McIlroy Cc: Christoph Hellwig , xfs@oss.sgi.com, xfs-dev Subject: Re: [PATCH] kill superflous buffer locking Message-ID: <20071128094802.GB7760@lst.de> References: <20070924184926.GA20661@lst.de> <4716DD79.6040309@sgi.com> <474CD2BA.8070204@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <474CD2BA.8070204@sgi.com> User-Agent: Mutt/1.3.28i X-Scanned-By: MIMEDefang 2.39 X-Virus-Scanned: ClamAV 0.91.2/4934/Tue Nov 27 15:17:17 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13811 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: xfs On Wed, Nov 28, 2007 at 01:30:18PM +1100, Lachlan McIlroy wrote: > Christoph, > > We've fixed the source of the assertion (that was the bugs in > xfs_buf_associate_memory()) so I'm pushing your buffer lock > removal patch back in again. > > While looking through it I found a couple of issues: > > - It called unlock_page() before calls to PagePrivate() and > PageUptodate(). I think the page needs to be locked during > these calls so I moved the unlock_page() further down. This doesn't really matter at all. XFS is the only user of the address_space the pages reside in and we never have overlapping buffers. That's the reason why we can remove the buffer locking. Now if there was a variant of find_or_create_page that didn't set pages locked at all we could happily use it and get rid of the last place we deal with locked pages. > - Unlocking the pages as we go can cause a double unlock in the > error handling for a NULL page in the XBF_READ_AHEAD case so I > removed the unlocking code for that case. Indeed. Thanks for spotting this. From owner-xfs@oss.sgi.com Wed Nov 28 06:15:40 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 28 Nov 2007 06:15:47 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.4 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from ninsei.hu (ninsei.hu [212.92.23.158]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lASEFbaZ000344 for ; Wed, 28 Nov 2007 06:15:40 -0800 Received: from luba (pb-d-128-141-57-252.cern.ch [128.141.57.252]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by chatsubo.ninsei.hu (Postfix) with ESMTP id 09DC97A5E for ; Wed, 28 Nov 2007 14:45:15 +0100 (CET) Received: by luba (Postfix, from userid 32266) id D9E7810B444; Wed, 28 Nov 2007 14:45:23 +0100 (CET) Date: Wed, 28 Nov 2007 14:45:23 +0100 From: KELEMEN Peter To: xfs@oss.sgi.com Subject: 2.6.24-rc3 oopses while mounting fs Message-ID: <20071128134523.GF7793@luba> Mail-Followup-To: xfs@oss.sgi.com MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit Organization: CERN European Laboratory for Particle Physics, Switzerland X-GPG-KeyID: 1024D/9FF0CABE 2004-04-03 X-GPG-Fingerprint: 6C9E 5917 3B06 E4EE 6356 7BF0 8F3E CAB6 9FF0 CABE X-Comment: Personal opinion. Paragraphs might have been reformatted. X-Copyright: Forwarding or publishing without permission is prohibited. X-Accept-Language: hu,en User-Agent: Mutt/1.5.17 (2007-11-01) X-Virus-Scanned: ClamAV 0.91.2/4936/Wed Nov 28 02:55:15 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13812 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: Peter.Kelemen@cern.ch Precedence: bulk X-list: xfs Box is x86_64 with RHEL4 userspace. I have the log image saved and I can play with the filesystem if needed. Peter SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled SGI XFS Quota Management subsystem XFS mounting filesystem sdc Starting XFS recovery on filesystem: sdc (logdev: internal) Unable to handle kernel paging request at ffffc20001c9b000 RIP: [] :xfs:xlog_recover_add_to_trans+0x64/0xef PGD 15781d067 PUD 15781c067 PMD 130fda067 PTE 0 Oops: 0000 [1] SMP CPU 0 Modules linked in: xfs sd_mod 3w_9xxx scsi_mod ohci_hcd uhci_hcd ehci_hcd ipv6 bnx2 sky2 r8169 ns83820 dl2k acenic e100 tg3 e1000 mii Pid: 2434, comm: mount Not tainted 2.6.24-rc3 #1 RIP: 0010:[] [] :xfs:xlog_recover_add_to_trans+0x64/0xef RSP: 0018:ffff81013240f728 EFLAGS: 00010286 RAX: ffffc20001c9c000 RBX: 000000000020bd78 RCX: 00000000001cc280 RDX: 0000000000000000 RSI: ffffc20001c9b000 RDI: ffffc20001cdbaf8 RBP: ffff81013240f758 R08: ffff8101578011a2 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000002 R12: ffff8101318e4220 R13: ffffc20001c5b508 R14: ffff8101318e4770 R15: ffffc20001c5b4fc FS: 00002b0dc2141b00(0000) GS:ffffffff805ef000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffc20001c9b000 CR3: 000000013243d000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process mount (pid: 2434, threadinfo ffff81013240e000, task ffff8101320a4000) Stack: 0020bd785769c5e8 000000005769c5e8 000000000000000a ffffc20001c5b508 000000000000011e ffffc20001c5b4fc ffff81013240f7b8 ffffffff881c79db ffffc20001c6a000 00000001881da0bc ffff81013246c000 ffff81013240f828 Call Trace: [] :xfs:xlog_recover_process_data+0x184/0x1d9 [] :xfs:xlog_do_recovery_pass+0x74b/0x795 [] :xfs:xlog_do_log_recovery+0x3d/0x82 [] :xfs:xlog_do_recover+0x12/0x11c [] :xfs:xlog_recover+0x84/0x92 [] :xfs:xfs_log_mount+0x8c/0xe4 [] :xfs:xfs_mountfs+0x67d/0x97b [] :xfs:xfs_mru_cache_create+0x170/0x1d5 [] :xfs:xfs_fstrm_free_func+0x0/0x81 [] :xfs:xfs_ioinit+0xb/0xd [] :xfs:xfs_mount+0x2bb/0x36b [] :xfs:xfs_fs_fill_super+0xd2/0x245 [] get_filesystem+0x17/0x39 [] sget+0x3fb/0x418 [] sget+0x403/0x418 [] set_bdev_super+0x0/0x14 [] get_sb_bdev+0x123/0x16f [] :xfs:xfs_fs_fill_super+0x0/0x245 [] :xfs:xfs_fs_get_sb+0x13/0x18 [] vfs_kern_mount+0x8f/0x11c [] do_kern_mount+0x44/0xf4 [] do_mount+0x6d8/0x71e [] __up_read+0x93/0x9b [] up_read+0x23/0x27 [] do_page_fault+0x42e/0x7c7 [] zone_statistics+0x64/0x69 [] __alloc_pages+0x6b/0x311 [] sys_mount+0x8a/0xcc [] system_call+0x7e/0x83 Code: f3 a4 49 89 c7 49 8b 54 24 08 8b 42 18 85 c0 74 0e 3b 42 14 -- .+'''+. .+'''+. .+'''+. .+'''+. .+'' Kelemen Péter / \ / \ Peter.Kelemen@cern.ch .+' `+...+' `+...+' `+...+' `+...+' From owner-xfs@oss.sgi.com Wed Nov 28 15:57:34 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 28 Nov 2007 15:57:36 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_43, T_STOX_BOUND_090909_B autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lASNvR6h029975 for ; Wed, 28 Nov 2007 15:57:32 -0800 Received: from [134.14.55.78] (redback.melbourne.sgi.com [134.14.55.78]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA27149 for ; Thu, 29 Nov 2007 10:57:36 +1100 Message-ID: <474E003A.7020000@sgi.com> Date: Thu, 29 Nov 2007 10:56:42 +1100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com User-Agent: Thunderbird 2.0.0.9 (X11/20071031) MIME-Version: 1.0 To: xfs@oss.sgi.com Subject: Re: 2.6.24-rc3 oopses while mounting fs References: <20071128134523.GF7793@luba> In-Reply-To: <20071128134523.GF7793@luba> Content-Type: multipart/mixed; boundary="------------040500030409020304090304" X-Virus-Scanned: ClamAV 0.91.2/4948/Wed Nov 28 12:42:33 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13813 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs This is a multi-part message in MIME format. --------------040500030409020304090304 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Peter, This looks like a problem we've just fixed, try this patch. We'll get this to mainline soon. Lachlan KELEMEN Peter wrote: > Box is x86_64 with RHEL4 userspace. I have the log image saved > and I can play with the filesystem if needed. > > Peter > > SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled > SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled > SGI XFS Quota Management subsystem > XFS mounting filesystem sdc > Starting XFS recovery on filesystem: sdc (logdev: internal) > Unable to handle kernel paging request at ffffc20001c9b000 RIP: > [] :xfs:xlog_recover_add_to_trans+0x64/0xef > PGD 15781d067 PUD 15781c067 PMD 130fda067 PTE 0 > Oops: 0000 [1] SMP > CPU 0 > Modules linked in: xfs sd_mod 3w_9xxx scsi_mod ohci_hcd uhci_hcd ehci_hcd ipv6 bnx2 sky2 r8169 ns83820 dl2k acenic e100 tg3 e1000 mii > Pid: 2434, comm: mount Not tainted 2.6.24-rc3 #1 > RIP: 0010:[] [] :xfs:xlog_recover_add_to_trans+0x64/0xef > RSP: 0018:ffff81013240f728 EFLAGS: 00010286 > RAX: ffffc20001c9c000 RBX: 000000000020bd78 RCX: 00000000001cc280 > RDX: 0000000000000000 RSI: ffffc20001c9b000 RDI: ffffc20001cdbaf8 > RBP: ffff81013240f758 R08: ffff8101578011a2 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000002 R12: ffff8101318e4220 > R13: ffffc20001c5b508 R14: ffff8101318e4770 R15: ffffc20001c5b4fc > FS: 00002b0dc2141b00(0000) GS:ffffffff805ef000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: ffffc20001c9b000 CR3: 000000013243d000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process mount (pid: 2434, threadinfo ffff81013240e000, task ffff8101320a4000) > Stack: 0020bd785769c5e8 000000005769c5e8 000000000000000a ffffc20001c5b508 > 000000000000011e ffffc20001c5b4fc ffff81013240f7b8 ffffffff881c79db > ffffc20001c6a000 00000001881da0bc ffff81013246c000 ffff81013240f828 > Call Trace: > [] :xfs:xlog_recover_process_data+0x184/0x1d9 > [] :xfs:xlog_do_recovery_pass+0x74b/0x795 > [] :xfs:xlog_do_log_recovery+0x3d/0x82 > [] :xfs:xlog_do_recover+0x12/0x11c > [] :xfs:xlog_recover+0x84/0x92 > [] :xfs:xfs_log_mount+0x8c/0xe4 > [] :xfs:xfs_mountfs+0x67d/0x97b > [] :xfs:xfs_mru_cache_create+0x170/0x1d5 > [] :xfs:xfs_fstrm_free_func+0x0/0x81 > [] :xfs:xfs_ioinit+0xb/0xd > [] :xfs:xfs_mount+0x2bb/0x36b > [] :xfs:xfs_fs_fill_super+0xd2/0x245 > [] get_filesystem+0x17/0x39 > [] sget+0x3fb/0x418 > [] sget+0x403/0x418 > [] set_bdev_super+0x0/0x14 > [] get_sb_bdev+0x123/0x16f > [] :xfs:xfs_fs_fill_super+0x0/0x245 > [] :xfs:xfs_fs_get_sb+0x13/0x18 > [] vfs_kern_mount+0x8f/0x11c > [] do_kern_mount+0x44/0xf4 > [] do_mount+0x6d8/0x71e > [] __up_read+0x93/0x9b > [] up_read+0x23/0x27 > [] do_page_fault+0x42e/0x7c7 > [] zone_statistics+0x64/0x69 > [] __alloc_pages+0x6b/0x311 > [] sys_mount+0x8a/0xcc > [] system_call+0x7e/0x83 > > > Code: f3 a4 49 89 c7 49 8b 54 24 08 8b 42 18 85 c0 74 0e 3b 42 14 > --------------040500030409020304090304 Content-Type: text/plain; name="patch" Content-Transfer-Encoding: base64 Content-Disposition: inline; filename="patch" LS0tIGEvZnMveGZzL2xpbnV4LTIuNi94ZnNfYnVmLmMJMjAwNy0xMS0yOSAx MDo1MzowMS4wMDAwMDAwMDAgKzExMDAKKysrIGIvZnMveGZzL2xpbnV4LTIu Ni94ZnNfYnVmLmMJMjAwNy0xMS0yOSAxMDo1MzowMS4wMDAwMDAwMDAgKzEx MDAKQEAgLTcyNSwxNSArNzI1LDE1IEBACiB7CiAJaW50CQkJcnZhbDsKIAlp bnQJCQlpID0gMDsKLQlzaXplX3QJCQlwdHI7Ci0Jc2l6ZV90CQkJZW5kLCBl bmRfY3VyOwotCW9mZl90CQkJb2Zmc2V0OworCXVuc2lnbmVkIGxvbmcJCXBh Z2VhZGRyOworCXVuc2lnbmVkIGxvbmcJCW9mZnNldDsKKwlzaXplX3QJCQli dWZsZW47CiAJaW50CQkJcGFnZV9jb3VudDsKIAotCXBhZ2VfY291bnQgPSBQ QUdFX0NBQ0hFX0FMSUdOKGxlbikgPj4gUEFHRV9DQUNIRV9TSElGVDsKLQlv ZmZzZXQgPSAob2ZmX3QpIG1lbSAtICgob2ZmX3QpbWVtICYgUEFHRV9DQUNI RV9NQVNLKTsKLQlpZiAob2Zmc2V0ICYmIChsZW4gPiBQQUdFX0NBQ0hFX1NJ WkUpKQotCQlwYWdlX2NvdW50Kys7CisJcGFnZWFkZHIgPSAodW5zaWduZWQg bG9uZyltZW0gJiBQQUdFX0NBQ0hFX01BU0s7CisJb2Zmc2V0ID0gKHVuc2ln bmVkIGxvbmcpbWVtIC0gcGFnZWFkZHI7CisJYnVmbGVuID0gUEFHRV9DQUNI RV9BTElHTihsZW4gKyBvZmZzZXQpOworCXBhZ2VfY291bnQgPSBidWZsZW4g Pj4gUEFHRV9DQUNIRV9TSElGVDsKIAogCS8qIEZyZWUgYW55IHByZXZpb3Vz IHNldCBvZiBwYWdlIHBvaW50ZXJzICovCiAJaWYgKGJwLT5iX3BhZ2VzKQpA QCAtNzQ3LDIyICs3NDcsMTUgQEAKIAkJcmV0dXJuIHJ2YWw7CiAKIAlicC0+ Yl9vZmZzZXQgPSBvZmZzZXQ7Ci0JcHRyID0gKHNpemVfdCkgbWVtICYgUEFH RV9DQUNIRV9NQVNLOwotCWVuZCA9IFBBR0VfQ0FDSEVfQUxJR04oKHNpemVf dCkgbWVtICsgbGVuKTsKLQllbmRfY3VyID0gZW5kOwotCS8qIHNldCB1cCBm aXJzdCBwYWdlICovCi0JYnAtPmJfcGFnZXNbMF0gPSBtZW1fdG9fcGFnZSht ZW0pOwotCi0JcHRyICs9IFBBR0VfQ0FDSEVfU0laRTsKLQlicC0+Yl9wYWdl X2NvdW50ID0gKytpOwotCXdoaWxlIChwdHIgPCBlbmQpIHsKLQkJYnAtPmJf cGFnZXNbaV0gPSBtZW1fdG9fcGFnZSgodm9pZCAqKXB0cik7Ci0JCWJwLT5i X3BhZ2VfY291bnQgPSArK2k7Ci0JCXB0ciArPSBQQUdFX0NBQ0hFX1NJWkU7 CisKKwlmb3IgKGkgPSAwOyBpIDwgYnAtPmJfcGFnZV9jb3VudDsgaSsrKSB7 CisJCWJwLT5iX3BhZ2VzW2ldID0gbWVtX3RvX3BhZ2UoKHZvaWQgKilwYWdl YWRkcik7CisJCXBhZ2VhZGRyICs9IFBBR0VfQ0FDSEVfU0laRTsKIAl9CiAJ YnAtPmJfbG9ja2VkID0gMDsKIAotCWJwLT5iX2NvdW50X2Rlc2lyZWQgPSBi cC0+Yl9idWZmZXJfbGVuZ3RoID0gbGVuOworCWJwLT5iX2NvdW50X2Rlc2ly ZWQgPSBsZW47CisJYnAtPmJfYnVmZmVyX2xlbmd0aCA9IGJ1ZmxlbjsKIAli cC0+Yl9mbGFncyB8PSBYQkZfTUFQUEVEOwogCiAJcmV0dXJuIDA7Cg== --------------040500030409020304090304-- From owner-xfs@oss.sgi.com Wed Nov 28 16:44:27 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 28 Nov 2007 16:44:33 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAT0iOgG008037 for ; Wed, 28 Nov 2007 16:44:26 -0800 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA28701; Thu, 29 Nov 2007 11:44:29 +1100 Received: by chook.melbourne.sgi.com (Postfix, from userid 44625) id 34EA658C4C0F; Thu, 29 Nov 2007 11:44:29 +1100 (EST) To: sgi.bugs.xfs@engr.sgi.com, xfs@oss.sgi.com Subject: PARTIAL TAKE 971186 - kill superflous buffer locking (2nd attempt) Message-Id: <20071129004429.34EA658C4C0F@chook.melbourne.sgi.com> Date: Thu, 29 Nov 2007 11:44:29 +1100 (EST) From: lachlan@sgi.com (Lachlan McIlroy) X-Virus-Scanned: ClamAV 0.91.2/4948/Wed Nov 28 12:42:33 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13814 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs kill superflous buffer locking (2nd attempt) There is no need to lock any page in xfs_buf.c because we operate on our own address_space and all locking is covered by the buffer semaphore. If we ever switch back to main blockdeive address_space as suggested e.g. for fsblock with a similar scheme the locking will have to be totally revised anyway because the current scheme is neither correct nor coherent with itself. Signed-off-by: Christoph Hellwig Date: Thu Nov 29 11:43:17 AEDT 2007 Workarea: redback.melbourne.sgi.com:/home/lachlan/isms/2.6.x-buflock Inspected by: hch Author: lachlan The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb Modid: xfs-linux-melb:xfs-kern:30156a fs/xfs/xfsidbg.c - 1.341 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfsidbg.c.diff?r1=text&tr1=1.341&r2=text&tr2=1.340&f=h fs/xfs/linux-2.6/xfs_buf.h - 1.123 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_buf.h.diff?r1=text&tr1=1.123&r2=text&tr2=1.122&f=h fs/xfs/linux-2.6/xfs_buf.c - 1.250 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_buf.c.diff?r1=text&tr1=1.250&r2=text&tr2=1.249&f=h - kill superflous buffer locking (2nd attempt) From owner-xfs@oss.sgi.com Wed Nov 28 20:03:34 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 28 Nov 2007 20:03:40 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from sandeen.net (sandeen.net [209.173.210.139]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAT43WpX003195 for ; Wed, 28 Nov 2007 20:03:34 -0800 Received: from Liberator.local (liberator.sandeen.net [10.0.0.4]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by sandeen.net (Postfix) with ESMTP id C8EF41800449D for ; Wed, 28 Nov 2007 22:03:39 -0600 (CST) Message-ID: <474E3A1C.4050302@sandeen.net> Date: Wed, 28 Nov 2007 22:03:40 -0600 From: Eric Sandeen User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: xfs-oss Subject: [PATCH] make xfs_info work on mountpoints with spaces Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4948/Wed Nov 28 12:42:33 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13815 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: sandeen@sandeen.net Precedence: bulk X-list: xfs I don't know why one would make mountpoints with spaces, but apparently people do: [Bug 402621] New: xfs_info doesn't handle mountpoint with spaces https://bugzilla.redhat.com/show_bug.cgi?id=402621 I'm no bash expert but I think this solves it: Signed-off-by: Eric Sandeen --- --- xfs-cmds.orig/xfsprogs/growfs/xfs_info.sh 2007-11-28 09:14:02.000000000 -0600 +++ xfs-cmds/xfsprogs/growfs/xfs_info.sh 2007-11-28 09:21:49.000000000 -0600 @@ -16,10 +16,10 @@ ;; esac done -set -- extra $@ +set -- extra "$@" shift $OPTIND case $# in - 1) xfs_growfs -p xfs_info -n $OPTS $1 + 1) xfs_growfs -p xfs_info -n $OPTS "$1" status=$? ;; *) echo $USAGE 1>&2 From owner-xfs@oss.sgi.com Wed Nov 28 20:12:26 2007 Received: with ECARTIS (v1.0.0; list xfs); Wed, 28 Nov 2007 20:12:30 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_45, J_CHICKENPOX_63,J_CHICKENPOX_66 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAT4CLt8004784 for ; Wed, 28 Nov 2007 20:12:24 -0800 Received: from pc-bnaujok.melbourne.sgi.com (pc-bnaujok.melbourne.sgi.com [134.14.55.58]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA05329; Thu, 29 Nov 2007 15:12:29 +1100 To: "xfs@oss.sgi.com" , xfs-dev Subject: REVIEW: Improve "." and ".." problem detection and handling in xfs_repair From: "Barry Naujok" Organization: SGI Content-Type: multipart/mixed; boundary=----------WpmrDzY9L2gCIlCYFMw2EE MIME-Version: 1.0 References: Date: Thu, 29 Nov 2007 15:13:43 +1100 Message-ID: In-Reply-To: User-Agent: Opera Mail/9.24 (Win32) X-Virus-Scanned: ClamAV 0.91.2/4948/Wed Nov 28 12:42:33 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13816 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: bnaujok@sgi.com Precedence: bulk X-list: xfs ------------WpmrDzY9L2gCIlCYFMw2EE Content-Type: text/plain; format=flowed; delsp=yes; charset=utf-8 Content-Transfer-Encoding: 7bit Ping? ------- Forwarded message ------- From: "Barry Naujok" To: "xfs@oss.sgi.com" , xfs-dev Cc: Subject: REVIEW: Improve "." and ".." problem detection and handling in xfs_repair Date: Thu, 20 Sep 2007 17:59:05 +1000 In a non-shortform directory (ie. the directory contents is stored within the inode on-disk), if "." or ".." are placed in different blocks, the directory cannot be made shortform when everything else in the directory is deleted as two blocks still remain. This prevents the kernel from deleting this directory saying it's "not empty" (see http://oss.sgi.com/archives/xfs/2007-09/msg00142.html ). The attached patch detects "." or ".." not being in the first block and rebuilds it in repair so it may be deleted when required. Also, testing this patch, I found if ".." didn't exist in a directory, yet another directory had an entry for it, it was put in lost+found. The patch now puts ".." back in for the directory referencing it. I also improved the lookup of the parent when rebuilding a directory (it did a hash lookup of ".." instead of the already existing get_inode_parent() call). ------------WpmrDzY9L2gCIlCYFMw2EE Content-Disposition: attachment; filename=improve_dot_and_dotdot_handling.patch Content-Type: text/x-patch; name=improve_dot_and_dotdot_handling.patch Content-Transfer-Encoding: Quoted-Printable =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D xfsprogs/repair/incore_ino.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- a/xfsprogs/repair/incore_ino.c 2007-09-20 17:45:32.000000000 +1000 +++ b/xfsprogs/repair/incore_ino.c 2007-09-19 15:48:06.845973418 +1000 @@ -621,49 +621,61 @@ print_uncertain_inode_list(xfs_agnumber_ * is the Nth bit set in the mask is stored in the Nth location in * the array where N starts at 0. */ + void -set_inode_parent(ino_tree_node_t *irec, int offset, xfs_ino_t parent) -{ - int i; - int cnt; - int target; - __uint64_t bitmask; - parent_entry_t *tmp; +set_inode_parent( + ino_tree_node_t *irec, + int offset, + xfs_ino_t parent) +{ + parent_list_t *ptbl; + int i; + int cnt; + int target; + __uint64_t bitmask; + parent_entry_t *tmp; =20 - ASSERT(full_ino_ex_data =3D=3D 0); + if (full_ino_ex_data) + ptbl =3D irec->ino_un.ex_data->parents; + else + ptbl =3D irec->ino_un.plist; =20 - if (irec->ino_un.plist =3D=3D NULL) { - irec->ino_un.plist =3D - (parent_list_t*)malloc(sizeof(parent_list_t)); - if (!irec->ino_un.plist) + if (ptbl =3D=3D NULL) { + ptbl =3D (parent_list_t *)malloc(sizeof(parent_list_t)); + if (!ptbl) do_error(_("couldn't malloc parent list table\n")); =20 - irec->ino_un.plist->pmask =3D 1LL << offset; - irec->ino_un.plist->pentries =3D - (xfs_ino_t*)memalign(sizeof(xfs_ino_t), sizeof(xfs_ino_t)); - if (!irec->ino_un.plist->pentries) + if (full_ino_ex_data) + irec->ino_un.ex_data->parents =3D ptbl; + else + irec->ino_un.plist =3D ptbl; + + ptbl->pmask =3D 1LL << offset; + ptbl->pentries =3D (xfs_ino_t*)memalign(sizeof(xfs_ino_t), + sizeof(xfs_ino_t)); + if (!ptbl->pentries) do_error(_("couldn't memalign pentries table\n")); #ifdef DEBUG - irec->ino_un.plist->cnt =3D 1; + ptbl->cnt =3D 1; #endif - irec->ino_un.plist->pentries[0] =3D parent; + ptbl->pentries[0] =3D parent; =20 return; } =20 - if (irec->ino_un.plist->pmask & (1LL << offset)) { + if (ptbl->pmask & (1LL << offset)) { bitmask =3D 1LL; target =3D 0; =20 for (i =3D 0; i < offset; i++) { - if (irec->ino_un.plist->pmask & bitmask) + if (ptbl->pmask & bitmask) target++; bitmask <<=3D 1; } #ifdef DEBUG - ASSERT(target < irec->ino_un.plist->cnt); + ASSERT(target < ptbl->cnt); #endif - irec->ino_un.plist->pentries[target] =3D parent; + ptbl->pentries[target] =3D parent; =20 return; } @@ -672,7 +684,7 @@ set_inode_parent(ino_tree_node_t *irec,=20 cnt =3D target =3D 0; =20 for (i =3D 0; i < XFS_INODES_PER_CHUNK; i++) { - if (irec->ino_un.plist->pmask & bitmask) { + if (ptbl->pmask & bitmask) { cnt++; if (i < offset) target++; @@ -682,7 +694,7 @@ set_inode_parent(ino_tree_node_t *irec,=20 } =20 #ifdef DEBUG - ASSERT(cnt =3D=3D irec->ino_un.plist->cnt); + ASSERT(cnt =3D=3D ptbl->cnt); #endif ASSERT(cnt >=3D target); =20 @@ -690,23 +702,23 @@ set_inode_parent(ino_tree_node_t *irec,=20 if (!tmp) do_error(_("couldn't memalign pentries table\n")); =20 - (void) bcopy(irec->ino_un.plist->pentries, tmp, + (void) bcopy(ptbl->pentries, tmp, target * sizeof(parent_entry_t)); =20 if (cnt > target) - (void) bcopy(irec->ino_un.plist->pentries + target, + (void) bcopy(ptbl->pentries + target, tmp + target + 1, (cnt - target) * sizeof(parent_entry_t)); =20 - free(irec->ino_un.plist->pentries); + free(ptbl->pentries); =20 - irec->ino_un.plist->pentries =3D tmp; + ptbl->pentries =3D tmp; =20 #ifdef DEBUG - irec->ino_un.plist->cnt++; + ptbl->cnt++; #endif - irec->ino_un.plist->pentries[target] =3D parent; - irec->ino_un.plist->pmask |=3D (1LL << offset); + ptbl->pentries[target] =3D parent; + ptbl->pmask |=3D (1LL << offset); } =20 xfs_ino_t =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D xfsprogs/repair/phase6.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- a/xfsprogs/repair/phase6.c 2007-09-20 17:45:33.000000000 +1000 +++ b/xfsprogs/repair/phase6.c 2007-09-19 18:15:06.105909485 +1000 @@ -1624,16 +1624,25 @@ lf_block_dir_entry_check(xfs_mount_t *m */ if (is_inode_reached(irec, ino_offset)) { junkit =3D 1; - do_warn( -_("entry \"%s\" in dir %llu points to an already connected dir inode %llu,= \n"), + do_warn(_("entry \"%s\" in dir %llu points to an " + "already connected dir inode %llu,\n"), fname, ino, lino); } else if (parent =3D=3D ino) { add_inode_reached(irec, ino_offset); add_inode_ref(current_irec, current_ino_offset); - } else { + } else if (parent =3D=3D NULLFSINO) { + /* ".." was missing, but this entry refers to it, + so, set it as the parent and mark for rebuild */ + do_warn(_("entry \"%s\" in dir ino %llu doesn't have a" + " .. entry, will set it in ino %llu.\n"), + fname, ino, lino); + set_inode_parent(irec, ino_offset, ino); + add_inode_reached(irec, ino_offset); + add_inode_ref(current_irec, current_ino_offset); + } else { junkit =3D 1; - do_warn( -_("entry \"%s\" in dir ino %llu not consistent with .. value (%llu) in ino= %llu,\n"), + do_warn(_("entry \"%s\" in dir ino %llu not consistent" + " with .. value (%llu) in ino %llu,\n"), fname, ino, parent, lino); } =20 @@ -1788,10 +1797,12 @@ _("can't map leaf block %d in dir %llu,=20 =20 static void longform_dir2_rebuild( - xfs_mount_t *mp, - xfs_ino_t ino, - xfs_inode_t *ip, - dir_hash_tab_t *hashtab) + xfs_mount_t *mp, + xfs_ino_t ino, + xfs_inode_t *ip, + ino_tree_node_t *irec, + int ino_offset, + dir_hash_tab_t *hashtab) { int error; int nres; @@ -1800,7 +1811,6 @@ longform_dir2_rebuild( xfs_fsblock_t firstblock; xfs_bmap_free_t flist; xfs_inode_t pip; - int byhash; dir_hash_ent_t *p; int committed; int done; @@ -1813,19 +1823,14 @@ longform_dir2_rebuild( do_warn(_("rebuilding directory inode %llu\n"), ino); =20 /* - * first attempt to locate the parent inode, if it can't be found, - * set it to the root inode and it'll be adjusted or fixed later - * if incorrect (the inode number here needs to be valid for the - * libxfs_dir2_init() call). - */ - byhash =3D DIR_HASH_FUNC(hashtab, libxfs_da_hashname((uchar_t*)"..", 2)); - pip.i_ino =3D mp->m_sb.sb_rootino; - for (p =3D hashtab->byhash[byhash]; p; p =3D p->nextbyhash) { - if (p->namelen =3D=3D 2 && p->name[0] =3D=3D '.' && p->name[1] =3D=3D '.= ') { - pip.i_ino =3D p->inum; - break; - } - } + * first attempt to locate the parent inode, if it can't be + * found, set it to the root inode and it'll be moved to the + * orphanage later (the inode number here needs to be valid + * for the libxfs_dir2_init() call). + */ + pip.i_ino =3D get_inode_parent(irec, ino_offset); + if (pip.i_ino =3D=3D NULLFSINO) + pip.i_ino =3D mp->m_sb.sb_rootino; =20 XFS_BMAP_INIT(&flist, &firstblock); =20 @@ -2273,11 +2278,25 @@ longform_dir2_entry_check_data( * skip the '..' entry since it's checked when the * directory is reached by something else. if it never * gets reached, it'll be moved to the orphanage and we'll - * take care of it then. + * take care of it then. If it doesn't exist at all, the + * directory needs to be rebuilt first before being added + * to the orphanage. */ if (dep->namelen =3D=3D 2 && dep->name[0] =3D=3D '.' && - dep->name[1] =3D=3D '.') + dep->name[1] =3D=3D '.') { + if (da_bno !=3D 0) { + /* ".." should be in the first block */ + nbad++; + if (entry_junked(_("entry \"%s\" (ino %llu) " + "in dir %llu is not in the " + "the first block"), fname, + inum, ip->i_ino)) { + dep->name[0] =3D '/'; + libxfs_dir2_data_log_entry(tp, bp, dep); + } + } continue; + } ASSERT(no_modify || !verify_inum(mp, inum)); /* * special case the . entry. we know there's only one @@ -2291,6 +2310,16 @@ longform_dir2_entry_check_data( if (ip->i_ino =3D=3D inum) { ASSERT(dep->name[0] =3D=3D '.' && dep->namelen =3D=3D 1); add_inode_ref(current_irec, current_ino_offset); + if (da_bno !=3D 0 || dep !=3D (xfs_dir2_data_entry_t *)d->u) { + /* "." should be the first entry */ + nbad++; + if (entry_junked(_("entry \"%s\" in dir %llu is " + "not the first entry"), + fname, inum, ip->i_ino)) { + dep->name[0] =3D '/'; + libxfs_dir2_data_log_entry(tp, bp, dep); + } + } *need_dot =3D 0; continue; } @@ -2325,6 +2354,15 @@ _("entry \"%s\" in dir %llu points to an } else if (parent =3D=3D ip->i_ino) { add_inode_reached(irec, ino_offset); add_inode_ref(current_irec, current_ino_offset); + } else if (parent =3D=3D NULLFSINO) { + /* ".." was missing, but this entry refers to it, + so, set it as the parent and mark for rebuild */ + do_warn(_("entry \"%s\" in dir ino %llu doesn't have a" + " .. entry, will set it in ino %llu.\n"), + fname, ip->i_ino, inum); + set_inode_parent(irec, ino_offset, ip->i_ino); + add_inode_reached(irec, ino_offset); + add_inode_ref(current_irec, current_ino_offset); } else { junkit =3D 1; do_warn( @@ -2637,7 +2675,7 @@ longform_dir2_entry_check(xfs_mount_t *m irec, ino_offset, &bplist[db], hashtab, &freetab, da_bno, isblock); } - fixit =3D (*num_illegal !=3D 0) || dir2_is_badino(ino); + fixit =3D (*num_illegal !=3D 0) || dir2_is_badino(ino) || *need_dot; =20 /* check btree and freespace */ if (isblock) { @@ -2659,8 +2697,9 @@ longform_dir2_entry_check(xfs_mount_t *m for (i =3D 0; i < freetab->naents; i++) if (bplist[i]) libxfs_da_brelse(NULL, bplist[i]); - longform_dir2_rebuild(mp, ino, ip, hashtab); + longform_dir2_rebuild(mp, ino, ip, irec, ino_offset, hashtab); *num_illegal =3D 0; + *need_dot =3D 0; } else { for (i =3D 0; i < freetab->naents; i++) if (bplist[i]) @@ -2865,6 +2904,15 @@ shortform_dir_entry_check(xfs_mount_t *m } else if (parent =3D=3D ino) { add_inode_reached(irec, ino_offset); add_inode_ref(current_irec, current_ino_offset); + } else if (parent =3D=3D NULLFSINO) { + /* ".." was missing, but this entry refers to it, + so, set it as the parent and mark for rebuild */ + do_warn(_("entry \"%s\" in dir ino %llu doesn't have a" + " .. entry, will set it in ino %llu.\n"), + fname, ino, lino); + set_inode_parent(irec, ino_offset, ino); + add_inode_reached(irec, ino_offset); + add_inode_ref(current_irec, current_ino_offset); } else { junkit =3D 1; do_warn(_("entry \"%s\" in dir %llu not " @@ -3257,6 +3305,15 @@ shortform_dir2_entry_check(xfs_mount_t * } else if (parent =3D=3D ino) { add_inode_reached(irec, ino_offset); add_inode_ref(current_irec, current_ino_offset); + } else if (parent =3D=3D NULLFSINO) { + /* ".." was missing, but this entry refers to it, + so, set it as the parent and mark for rebuild */ + do_warn(_("entry \"%s\" in dir ino %llu doesn't have a" + " .. entry, will set it in ino %llu.\n"), + fname, ino, lino); + set_inode_parent(irec, ino_offset, ino); + add_inode_reached(irec, ino_offset); + add_inode_ref(current_irec, current_ino_offset); } else { junkit =3D 1; do_warn(_("entry \"%s\" in directory inode %llu" ------------WpmrDzY9L2gCIlCYFMw2EE-- From owner-xfs@oss.sgi.com Thu Nov 29 12:47:22 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 29 Nov 2007 12:47:25 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: * X-Spam-Status: No, score=1.8 required=5.0 tests=AWL,BAYES_00, RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_PSBL autolearn=no version=3.3.0-r574664 Received: from ogre.sisk.pl (ogre.sisk.pl [217.79.144.158]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lATKlIo3009936 for ; Thu, 29 Nov 2007 12:47:21 -0800 Received: from localhost (localhost.localdomain [127.0.0.1]) by ogre.sisk.pl (Postfix) with ESMTP id 956916B166; Thu, 29 Nov 2007 21:43:51 +0100 (CET) Received: from ogre.sisk.pl ([127.0.0.1]) by localhost (ogre.sisk.pl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 03945-06; Thu, 29 Nov 2007 21:43:33 +0100 (CET) Received: from [192.168.100.119] (nat-be3.aster.pl [212.76.37.200]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ogre.sisk.pl (Postfix) with ESMTP id 9CF7A6ED64; Thu, 29 Nov 2007 21:43:24 +0100 (CET) From: "Rafael J. Wysocki" To: Tino Keitel Subject: Re: XFS related Oops (suspend/resume related) Date: Thu, 29 Nov 2007 22:05:24 +0100 User-Agent: KMail/1.9.6 (enterprise 20070904.708012) Cc: David Chinner , linux-kernel@vger.kernel.org, xfs@oss.sgi.com References: <20071112064706.GA23595@dose.home.local> <20071127211155.GK119954183@sgi.com> <200711272253.01136.rjw@sisk.pl> In-Reply-To: <200711272253.01136.rjw@sisk.pl> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200711292205.25431.rjw@sisk.pl> X-Virus-Scanned: ClamAV 0.91.2/4954/Thu Nov 29 09:46:26 2007 on oss.sgi.com X-Virus-Scanned: amavisd-new at ogre.sisk.pl using MkS_Vir for Linux X-Virus-Status: Clean X-archive-position: 13817 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: rjw@sisk.pl Precedence: bulk X-list: xfs On Tuesday, 27 of November 2007, Rafael J. Wysocki wrote: > On Tuesday, 27 of November 2007, David Chinner wrote: > > On Tue, Nov 27, 2007 at 04:51:38PM +0100, Rafael J. Wysocki wrote: > > > On Monday, 26 of November 2007, Rafael J. Wysocki wrote: > > > > On Monday, 26 of November 2007, David Chinner wrote: > > > > > Now there's a message that I haven't seen in about 3 years. > > > > > > > > > > It indicates that the linux inode connected to the xfs_inode is not > > > > > the correct one. i.e. that the linux inode cache is out of step with > > > > > the XFS inode cache. > > > > > > > > > > Basically, that is not supposed to happen. I suspect that the way > > > > > threads are frozen is resulting in an inode lookup racing with > > > > > a reclaim. The reclaim thread gets stopped after any use threads, > > > > > and so we could have the situation that a process blocked in lookup > > > > > has the XFS inode reclaimed and reused before it gets unblocked. > > > > > > > > > > The question is why is it happening now when none of that code in > > > > > XFS has changed? > > > > > > > > > > Rafael, when are threads frozen? Only when they schedule or call > > > > > try_to_freeze()? > > > > > > > > Kernel threads freeze only when they call try_to_freeze(). User space tasks > > > > freeze while executing the signals handling code. > > > > > > > > > Did the freezer mechanism change in 2.6.23 (this is on 2.6.23.1)? > > > > > > > > Yes. Kernel threads are not sent fake signals by the freezer any more. > > > > > > Ah, sorry, this change has been merged after 2.6.23. However, before 2.6.23 > > > we had another important change that caused all kernel threads to have > > > PF_NOFREEZE set by default, unless they call set_freezable() explicitly. > > > > So try_to_freeze() will never freeze a thread if it has not been > > set_freezable()? And xfsbufd will never be frozen? > > No, it won't. > > I must have overlooked it, probably because it calls refrigerator() directly > and not try_to_freeze() ... > > I think something like the appended patch will help, then. Tino, can you check if this patch helps, please? Greetings, Rafael > --- > Fix breakage caused by commit 831441862956fffa17b9801db37e6ea1650b0f69 > that did not introduce the necessary call to set_freezable() in > xfs/linux-2.6/xfs_buf.c . > > Signed-off-by: Rafael J. Wysocki > --- > fs/xfs/linux-2.6/xfs_buf.c | 2 ++ > 1 file changed, 2 insertions(+) > > Index: linux-2.6/fs/xfs/linux-2.6/xfs_buf.c > =================================================================== > --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_buf.c > +++ linux-2.6/fs/xfs/linux-2.6/xfs_buf.c > @@ -1750,6 +1750,8 @@ xfsbufd( > > current->flags |= PF_MEMALLOC; > > + set_freezable(); > + > do { > if (unlikely(freezing(current))) { > set_bit(XBT_FORCE_SLEEP, &target->bt_flags); > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > -- "Premature optimization is the root of all evil." - Donald Knuth From owner-xfs@oss.sgi.com Thu Nov 29 15:45:04 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 29 Nov 2007 15:45:08 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.0-r574664 Received: from mail.g-house.de (ns2.g-housing.de [81.169.133.75]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lATNj1wC000844 for ; Thu, 29 Nov 2007 15:45:04 -0800 Received: from [89.59.6.151] (helo=[192.168.178.25]) by mail.g-house.de with esmtpsa (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.63) (envelope-from ) id 1Ixt87-00007s-41; Fri, 30 Nov 2007 00:49:15 +0100 Date: Fri, 30 Nov 2007 00:45:05 +0100 (CET) From: Christian Kujau X-X-Sender: evil@sheep.housecafe.de To: Christoph Hellwig cc: linux-xfs@oss.sgi.com, LKML Subject: Re: [PATCH] xfs: revert to double-buffering readdir In-Reply-To: <20071125163014.GA17922@infradead.org> Message-ID: References: <20071114070400.GA25708@puku.stupidest.org> <20071125163014.GA17922@infradead.org> User-Agent: Alpine 0.99999 (DEB 796 2007-11-08) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-Virus-Scanned: ClamAV 0.91.2/4954/Thu Nov 29 09:46:26 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13818 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lists@nerdbynature.de Precedence: bulk X-list: xfs On Sun, 25 Nov 2007, Christoph Hellwig wrote: > This patch does exactly that and reverts xfs_file_readdir to what's > basically the 2.6.23 version minus the uio and vnops junk. Thanks, works here too (without nordirplus as a mountoption). Am I supposed to close the bug[0] or do you guys want to leave this open to track the Real Fix (TM) for 2.6.25? Again, thank you for the fix! Christian. [0] http://bugzilla.kernel.org/show_bug.cgi?id=9400 -- BOFH excuse #112: The monitor is plugged into the serial port From owner-xfs@oss.sgi.com Thu Nov 29 19:30:37 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 29 Nov 2007 19:30:42 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAU3UVgh030367 for ; Thu, 29 Nov 2007 19:30:35 -0800 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA16920; Fri, 30 Nov 2007 14:30:35 +1100 Received: by chook.melbourne.sgi.com (Postfix, from userid 1116) id AFC5C58C4C0F; Fri, 30 Nov 2007 14:30:35 +1100 (EST) To: sgi.bugs.xfs@engr.sgi.com, xfs@oss.sgi.com Subject: PARTIAL TAKE 971186 - remove CFORK macros Message-Id: <20071130033035.AFC5C58C4C0F@chook.melbourne.sgi.com> Date: Fri, 30 Nov 2007 14:30:35 +1100 (EST) From: tes@sgi.com (Tim Shimmin) X-Virus-Scanned: ClamAV 0.91.2/4956/Thu Nov 29 16:56:01 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13819 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: tes@sgi.com Precedence: bulk X-list: xfs Remove CFORK macros and use code directly in IFORK and DFORK macros. Currently XFS_IFORK_* and XFS_DFORK* are implemented by means of XFS_CFORK* macros. But given that XFS_IFORK_* operates on an xfs_inode that embedds and xfs_icdinode_core and XFS_DFORK_* operates on an xfs_dinode that embedds a xfs_dinode_core one will have to do endian swapping while the other doesn't. Instead of having the current mess with the CFORK macros that have byteswapping and non-byteswapping version (which are inconsistantly named while we're at it) just define each family of the macros to stand by itself and simplify the whole matter. A few direct references to the CFORK variants were cleaned up to use IFORK or DFORK to make this possible. Signed-off-by: Christoph Hellwig Date: Fri Nov 30 14:28:54 AEDT 2007 Workarea: chook.melbourne.sgi.com:/build/tes/2.6.x-xfs-quilt Inspected by: hch@lst.de The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb Modid: xfs-linux-melb:xfs-kern:30163a fs/xfs/xfs_itable.c - 1.160 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_itable.c.diff?r1=text&tr1=1.160&r2=text&tr2=1.159&f=h fs/xfs/xfs_inode.c - 1.488 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_inode.c.diff?r1=text&tr1=1.488&r2=text&tr2=1.487&f=h fs/xfs/xfs_inode.h - 1.239 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_inode.h.diff?r1=text&tr1=1.239&r2=text&tr2=1.238&f=h fs/xfs/xfs_dinode.h - 1.84 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_dinode.h.diff?r1=text&tr1=1.84&r2=text&tr2=1.83&f=h fs/xfs/dmapi/xfs_dm.c - 1.60 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/dmapi/xfs_dm.c.diff?r1=text&tr1=1.60&r2=text&tr2=1.59&f=h - Remove CFORK macros and use code directly in IFORK and DFORK macros. From owner-xfs@oss.sgi.com Thu Nov 29 21:09:27 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 29 Nov 2007 21:09:35 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_43 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAU59Nqc014183 for ; Thu, 29 Nov 2007 21:09:25 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA18944; Fri, 30 Nov 2007 15:58:12 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAU4wAdD125988737; Fri, 30 Nov 2007 15:58:11 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAU4w8qS126451235; Fri, 30 Nov 2007 15:58:08 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Fri, 30 Nov 2007 15:58:08 +1100 From: David Chinner To: Bernd Schubert Cc: linux-xfs@oss.sgi.com Subject: Re: XFS performance problems on Linux x86_64 Message-ID: <20071130045808.GK119954183@sgi.com> References: <474C8A05.3020604@e-626.net> <20071127220536.GL119954183@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4956/Thu Nov 29 16:56:01 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13820 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Wed, Nov 28, 2007 at 12:13:57AM +0100, Bernd Schubert wrote: > Hello David, > > David Chinner wrote: > > > > # mkfs.xfs -f -l lazy-count=1,version=2,size=128m -i attr=2 -d agcount=4 > > # mount -o logbsize=256k > > thanks, I was also going to ask which are optimal parameters. Just didn't > have the time yet :) > Any idea when these options will be default? They should already be the defaults in the current CVS tree. ;) Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Thu Nov 29 23:20:18 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 29 Nov 2007 23:20:21 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.4 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_33, J_CHICKENPOX_34,J_CHICKENPOX_36,J_CHICKENPOX_66 autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAU7KC4J028179 for ; Thu, 29 Nov 2007 23:20:17 -0800 Received: from timothy-shimmins-power-mac-g5.local (boing.melbourne.sgi.com [134.14.55.141]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id SAA21951; Fri, 30 Nov 2007 18:20:12 +1100 Message-ID: <474FBA21.4070201@sgi.com> Date: Fri, 30 Nov 2007 18:22:09 +1100 From: Timothy Shimmin User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Christoph Hellwig CC: Chris Wedgwood , linux-xfs@oss.sgi.com, LKML Subject: Re: [PATCH] xfs: revert to double-buffering readdir References: <20071114070400.GA25708@puku.stupidest.org> <20071125163014.GA17922@infradead.org> In-Reply-To: <20071125163014.GA17922@infradead.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4956/Thu Nov 29 16:56:01 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13821 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: tes@sgi.com Precedence: bulk X-list: xfs Christoph Hellwig wrote: > The current readdir implementation deadlocks on a btree buffers locks > because nfsd calls back into ->lookup from the filldir callback. The > only short-term fix for this is to revert to the old inefficient > double-buffering scheme. > Probably why Steve did this: :) xfs_file.c ---------------------------- revision 1.40 date: 2001/03/15 23:33:20; author: lord; state: Exp; lines: +54 -17 modid: 2.4.x-xfs:slinx:90125a Change linvfs_readdir to allocate a buffer, call xfs to fill it, and then call the filldir function on each entry. This is instead of doing the filldir deep in the bowels of xfs which causes locking problems. ---------------------------- Yes it looks like it is done equivalently to before (minus the uio stuff etc). I don't know what the 7fff* masking is about but we did that previously. I hadn't come across the name[] struct field before, was used to name[0] (or name[1] in times gone by) but found that is a kosher way of doing things too for the variable len string at the end. Hmmm, don't see the point of "eof" local var now. Previously bhv_vop_readdir() returned eof. I presume if we don't move the offset (offset == startoffset) then we're done and break out? So we lost eof when going to the filldir in the getdents code etc... --Tim > This patch does exactly that and reverts xfs_file_readdir to what's > basically the 2.6.23 version minus the uio and vnops junk. > > I'll try to find something more optimal for 2.6.25 or at least find a > way to use the proper version for local access. > > > Signed-off-by: Christoph Hellwig > > Index: linux-2.6/fs/xfs/linux-2.6/xfs_file.c > =================================================================== > --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_file.c 2007-11-25 11:41:20.000000000 +0100 > +++ linux-2.6/fs/xfs/linux-2.6/xfs_file.c 2007-11-25 17:14:27.000000000 +0100 > @@ -218,6 +218,15 @@ > } > #endif /* CONFIG_XFS_DMAPI */ > > +/* > + * Unfortunately we can't just use the clean and simple readdir implementation > + * below, because nfs might call back into ->lookup from the filldir callback > + * and that will deadlock the low-level btree code. > + * > + * Hopefully we'll find a better workaround that allows to use the optimal > + * version at least for local readdirs for 2.6.25. > + */ > +#if 0 > STATIC int > xfs_file_readdir( > struct file *filp, > @@ -249,6 +258,121 @@ > return -error; > return 0; > } > +#else > + > +struct hack_dirent { > + int namlen; > + loff_t offset; > + u64 ino; > + unsigned int d_type; > + char name[]; > +}; > + > +struct hack_callback { > + char *dirent; > + size_t len; > + size_t used; > +}; > + > +STATIC int > +xfs_hack_filldir( > + void *__buf, > + const char *name, > + int namlen, > + loff_t offset, > + u64 ino, > + unsigned int d_type) > +{ > + struct hack_callback *buf = __buf; > + struct hack_dirent *de = (struct hack_dirent *)(buf->dirent + buf->used); > + > + if (buf->used + sizeof(struct hack_dirent) + namlen > buf->len) > + return -EINVAL; > + > + de->namlen = namlen; > + de->offset = offset; > + de->ino = ino; > + de->d_type = d_type; > + memcpy(de->name, name, namlen); > + buf->used += sizeof(struct hack_dirent) + namlen; > + return 0; > +} > + > +STATIC int > +xfs_file_readdir( > + struct file *filp, > + void *dirent, > + filldir_t filldir) > +{ > + struct inode *inode = filp->f_path.dentry->d_inode; > + xfs_inode_t *ip = XFS_I(inode); > + struct hack_callback buf; > + struct hack_dirent *de; > + int error; > + loff_t size; > + int eof = 0; > + xfs_off_t start_offset, curr_offset, offset; > + > + /* > + * Try fairly hard to get memory > + */ > + buf.len = PAGE_CACHE_SIZE; > + do { > + buf.dirent = kmalloc(buf.len, GFP_KERNEL); > + if (buf.dirent) > + break; > + buf.len >>= 1; > + } while (buf.len >= 1024); > + > + if (!buf.dirent) > + return -ENOMEM; > + > + curr_offset = filp->f_pos; > + if (curr_offset == 0x7fffffff) > + offset = 0xffffffff; > + else > + offset = filp->f_pos; > + > + while (!eof) { > + int reclen; > + start_offset = offset; > + > + buf.used = 0; > + error = -xfs_readdir(ip, &buf, buf.len, &offset, > + xfs_hack_filldir); > + if (error || offset == start_offset) { > + size = 0; > + break; > + } > + > + size = buf.used; > + de = (struct hack_dirent *)buf.dirent; > + while (size > 0) { > + if (filldir(dirent, de->name, de->namlen, > + curr_offset & 0x7fffffff, > + de->ino, de->d_type)) { > + goto done; > + } > + > + reclen = sizeof(struct hack_dirent) + de->namlen; > + size -= reclen; > + curr_offset = de->offset /* & 0x7fffffff */; > + de = (struct hack_dirent *)((char *)de + reclen); > + } > + } > + > + done: > + if (!error) { > + if (size == 0) > + filp->f_pos = offset & 0x7fffffff; > + else if (de) > + filp->f_pos = curr_offset; > + } > + > + kfree(buf.dirent); > + return error; > +} > +#endif > > STATIC int > xfs_file_mmap( > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ From owner-xfs@oss.sgi.com Thu Nov 29 23:33:07 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 29 Nov 2007 23:33:10 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=0.3 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_43, MIME_8BIT_HEADER autolearn=no version=3.3.0-r574664 Received: from mars.puhasoft.hu (mars.puhasoft.hu [212.108.197.33]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAU7X3Gj029677 for ; Thu, 29 Nov 2007 23:33:06 -0800 Received: from localhost (localhost [127.0.0.1]) by mars.puhasoft.hu (Postfix) with ESMTP id 2DB8B40B79C; Fri, 30 Nov 2007 08:17:45 +0100 (CET) Received: from mars.puhasoft.hu ([127.0.0.1]) by localhost (mars.puhasoft.hu [127.0.0.1]) (amavisd-maia, port 10024) with ESMTP id 32187-03; Fri, 30 Nov 2007 08:17:35 +0100 (CET) Received: from [10.1.0.13] (pool-2694.adsl.interware.hu [213.178.110.134]) by mars.puhasoft.hu (Postfix) with ESMTP id AE52E40B79A; Fri, 30 Nov 2007 08:17:35 +0100 (CET) Message-ID: <474FB90C.4060905@tsabi.hu> Date: Fri, 30 Nov 2007 08:17:32 +0100 From: =?UTF-8?B?VMOzdGggQ3NhYmE=?= User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; hu; rv:1.8.1.6) Gecko/20070728 Thunderbird/2.0.0.6 Mnenhy/0.7.5.666 MIME-Version: 1.0 To: David Chinner CC: Bernd Schubert , linux-xfs@oss.sgi.com Subject: Re: XFS performance problems on Linux x86_64 References: <474C8A05.3020604@e-626.net> <20071127220536.GL119954183@sgi.com> <20071130045808.GK119954183@sgi.com> In-Reply-To: <20071130045808.GK119954183@sgi.com> X-Face: =`Ln&-|yFvo0~G^%,v)#*1!$]jonPI@q#'MN{TdjKPSzyw'{zP{-\wYQ.Sg:_D1xQ|Fb?(:$_/D>DHVlzAozh2TeKwt@?T.~@m][wov3Uv=UV(!h&j6uj^yOu1B3%+ddo;Gl"pSK Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV 0.91.2/4956/Thu Nov 29 16:56:01 2007 on oss.sgi.com X-Virus-Scanned: Maia Mailguard 1.0.2 X-Virus-Status: Clean X-archive-position: 13822 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: tsabi@tsabi.hu Precedence: bulk X-list: xfs Hello list, I tried this parameters, and got this results with bonnie++. As i think there isnt any speedup with this parameteres, or i am doing something wrong? tsabi oldbck ~ # uname -a Linux oldbck 2.6.23-gentoo-r2-uk-01 #1 SMP Tue Nov 20 03:43:04 CET 2007 x86_64 Intel(R) Xeon(TM) CPU 2.80GHz GenuineIntel GNU/Linux test 1: oldbck mnt # mkfs.xfs -i size=512 -f /dev/md5 meta-data=/dev/md5 isize=512 agcount=32, agsize=4513008 blks = sectsz=512 attr=0 data = bsize=4096 blocks=144416192, imaxpct=25 = sunit=16 swidth=64 blks, unwritten=1 naming =version 2 bsize=4096 log =internal log bsize=4096 blocks=32768, version=1 = sectsz=512 sunit=0 blks, lazy-count=0 realtime =none extsz=262144 blocks=0, rtextents=0 oldbck mnt # mount /dev/md5 /mnt/data oldbck mnt # bonnie++ -u root -d /mnt/data Version 1.93c ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP oldbck 4G 522 99 235160 50 86692 17 1040 96 241681 22 457.9 9 Latency 15620us 205ms 119ms 100ms 50727us 79657us Version 1.93c ------Sequential Create------ --------Random Create-------- oldbck -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 11000 77 +++++ +++ 13765 82 11977 83 +++++ +++ 11049 77 Latency 66657us 49us 76936us 68300us 18us 72243us 1.93c,1.93c,oldbck,1,1196392150,4G,,522,99,235160,50,86692,17,1040,96,241681,22,457.9,9,16,,,,,11000,77,+++++,+++,13765,82,11977,83,+++++,+++,11049,77,15620us,205ms,119ms,100ms,50727us,79657us,66657us,49us,76936us,68300us,18us,72243us test 2: oldbck mnt # mkfs.xfs -f -l lazy-count=1,version=2,size=128m -i attr=2,size=512 -d agcount=4 /dev/md5 meta-data=/dev/md5 isize=512 agcount=4, agsize=36104048 blks = sectsz=512 attr=2 data = bsize=4096 blocks=144416192, imaxpct=25 = sunit=16 swidth=64 blks, unwritten=1 naming =version 2 bsize=4096 log =internal log bsize=4096 blocks=32768, version=2 = sectsz=512 sunit=16 blks, lazy-count=1 realtime =none extsz=262144 blocks=0, rtextents=0 oldbck mnt # mount -o logbsize=256k,nobarrier /dev/md5 /mnt/data oldbck mnt # bonnie++ -u root -d /mnt/data Version 1.93c ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP oldbck 4G 523 99 237016 44 87202 17 1040 96 245389 22 446.5 7 Latency 15531us 184ms 133ms 105ms 11835us 85541us Version 1.93c ------Sequential Create------ --------Random Create-------- oldbck -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 12772 77 +++++ +++ 14759 74 13499 81 +++++ +++ 11617 70 Latency 86835us 56us 79869us 79967us 24us 95818us 1.93c,1.93c,oldbck,1,1196391889,4G,,523,99,237016,44,87202,17,1040,96,245389,22,446.5,7,16,,,,,12772,77,+++++,+++,14759,74,13499,81,+++++,+++,11617,70,15531us,184ms,133ms,105ms,11835us,85541us,86835us,56us,79869us,79967us,24us,95818us test 3: oldbck mnt # mkfs.ext3 /dev/md5 mke2fs 1.40.2 (12-Jul-2007) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) 72220672 inodes, 144416192 blocks 7220809 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4294967296 4408 block groups 32768 blocks per group, 32768 fragments per group 16384 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000 Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 32 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. oldbck mnt # mount /dev/md5 /mnt/data oldbck mnt # oldbck mnt # bonnie++ -u root -d /mnt/data Version 1.93c ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP oldbck 4G 326 99 211619 80 91680 23 1341 96 237493 21 495.5 9 Latency 32953us 220ms 1599ms 62374us 59342us 472ms Version 1.93c ------Sequential Create------ --------Random Create-------- oldbck -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ Latency 7677us 165us 252us 17010us 13us 243us 1.93c,1.93c,oldbck,1,1196389368,4G,,326,99,211619,80,91680,23,1341,96,237493,21,495.5,9,16,,,,,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,32953us,220ms,1599ms,62374us,59342us,472ms,7677us,165us,252us,17010us,13us,243us David Chinner írta: > On Wed, Nov 28, 2007 at 12:13:57AM +0100, Bernd Schubert wrote: >> Hello David, >> >> David Chinner wrote: >>> # mkfs.xfs -f -l lazy-count=1,version=2,size=128m -i attr=2 -d agcount=4 >>> # mount -o logbsize=256k >> thanks, I was also going to ask which are optimal parameters. Just didn't >> have the time yet :) >> Any idea when these options will be default? > > They should already be the defaults in the current CVS tree. ;) > > Cheers, > > Dave. From owner-xfs@oss.sgi.com Thu Nov 29 23:47:18 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 29 Nov 2007 23:47:21 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAU7lF2T031479 for ; Thu, 29 Nov 2007 23:47:17 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id SAA22603; Fri, 30 Nov 2007 18:47:16 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAU7lFdD126445527; Fri, 30 Nov 2007 18:47:16 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAU7lCP9126065786; Fri, 30 Nov 2007 18:47:12 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Fri, 30 Nov 2007 18:47:12 +1100 From: David Chinner To: Christian Kujau Cc: Christoph Hellwig , linux-xfs@oss.sgi.com, LKML Subject: Re: [PATCH] xfs: revert to double-buffering readdir Message-ID: <20071130074712.GM119954183@sgi.com> References: <20071114070400.GA25708@puku.stupidest.org> <20071125163014.GA17922@infradead.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4956/Thu Nov 29 16:56:01 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13823 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Fri, Nov 30, 2007 at 12:45:05AM +0100, Christian Kujau wrote: > On Sun, 25 Nov 2007, Christoph Hellwig wrote: > >This patch does exactly that and reverts xfs_file_readdir to what's > >basically the 2.6.23 version minus the uio and vnops junk. > > Thanks, works here too (without nordirplus as a mountoption). > Am I supposed to close the bug[0] or do you guys want to leave this > open to track the Real Fix (TM) for 2.6.25? I've been giving the fix some QA - that change appears to have caused a different regression as well so I'm holding off for a little bit until we know what the cause of the other regression is before deciding whether to take this fix or back the entire change out. Either way we'll include the fix in 2.6.24.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Thu Nov 29 23:54:34 2007 Received: with ECARTIS (v1.0.0; list xfs); Thu, 29 Nov 2007 23:54:36 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,MIME_8BIT_HEADER autolearn=no version=3.3.0-r574664 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAU7sTFU032616 for ; Thu, 29 Nov 2007 23:54:33 -0800 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id SAA22725; Fri, 30 Nov 2007 18:54:37 +1100 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id lAU7sZdD126418016; Fri, 30 Nov 2007 18:54:36 +1100 (AEDT) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id lAU7sVio124626975; Fri, 30 Nov 2007 18:54:31 +1100 (AEDT) X-Authentication-Warning: snort.melbourne.sgi.com: dgc set sender to dgc@sgi.com using -f Date: Fri, 30 Nov 2007 18:54:31 +1100 From: David Chinner To: =?iso-8859-1?B?VMOzdGg=?= Csaba Cc: David Chinner , Bernd Schubert , linux-xfs@oss.sgi.com Subject: Re: XFS performance problems on Linux x86_64 Message-ID: <20071130075431.GN119954183@sgi.com> References: <474C8A05.3020604@e-626.net> <20071127220536.GL119954183@sgi.com> <20071130045808.GK119954183@sgi.com> <474FB90C.4060905@tsabi.hu> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <474FB90C.4060905@tsabi.hu> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV 0.91.2/4956/Thu Nov 29 16:56:01 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13824 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Fri, Nov 30, 2007 at 08:17:32AM +0100, Tóth Csaba wrote: > Hello list, > > I tried this parameters, and got this results with bonnie++. As i think > there isnt any speedup with this parameteres, or i am doing something wrong? The latter. Try creating more than 16 files in your test. Maybe 160,000 instead? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Fri Nov 30 00:17:31 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 30 Nov 2007 00:17:34 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=AWL,BAYES_20,MIME_8BIT_HEADER autolearn=no version=3.3.0-r574664 Received: from mars.puhasoft.hu (mars.puhasoft.hu [212.108.197.33]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAU8HPdX004975 for ; Fri, 30 Nov 2007 00:17:31 -0800 Received: from localhost (localhost [127.0.0.1]) by mars.puhasoft.hu (Postfix) with ESMTP id 6B7E94064D9 for ; Fri, 30 Nov 2007 09:17:33 +0100 (CET) Received: from mars.puhasoft.hu ([127.0.0.1]) by localhost (mars.puhasoft.hu [127.0.0.1]) (amavisd-maia, port 10024) with ESMTP id 05638-03 for ; Fri, 30 Nov 2007 09:17:26 +0100 (CET) Received: from [10.1.0.13] (pool-2694.adsl.interware.hu [213.178.110.134]) by mars.puhasoft.hu (Postfix) with ESMTP id 1C593406F82 for ; Fri, 30 Nov 2007 09:17:26 +0100 (CET) Message-ID: <474FC712.1010809@tsabi.hu> Date: Fri, 30 Nov 2007 09:17:22 +0100 From: =?UTF-8?B?VMOzdGggQ3NhYmE=?= User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; hu; rv:1.8.1.6) Gecko/20070728 Thunderbird/2.0.0.6 Mnenhy/0.7.5.666 MIME-Version: 1.0 To: linux-xfs@oss.sgi.com Subject: Re: XFS performance problems on Linux x86_64 References: <474C8A05.3020604@e-626.net> <20071127220536.GL119954183@sgi.com> <20071130045808.GK119954183@sgi.com> <474FB90C.4060905@tsabi.hu> <20071130075431.GN119954183@sgi.com> In-Reply-To: <20071130075431.GN119954183@sgi.com> X-Face: =`Ln&-|yFvo0~G^%,v)#*1!$]jonPI@q#'MN{TdjKPSzyw'{zP{-\wYQ.Sg:_D1xQ|Fb?(:$_/D>DHVlzAozh2TeKwt@?T.~@m][wov3Uv=UV(!h&j6uj^yOu1B3%+ddo;Gl"pSK Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV 0.91.2/4956/Thu Nov 29 16:56:01 2007 on oss.sgi.com X-Virus-Scanned: Maia Mailguard 1.0.2 X-Virus-Status: Clean X-archive-position: 13825 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: tsabi@tsabi.hu Precedence: bulk X-list: xfs Hey, David Chinner írta: > On Fri, Nov 30, 2007 at 08:17:32AM +0100, Tóth Csaba wrote: >> Hello list, >> >> I tried this parameters, and got this results with bonnie++. As i think >> there isnt any speedup with this parameteres, or i am doing something wrong? > > Try creating more than 16 files in your test. Maybe 160,000 instead? I believe bonnie++ has a test like that too. (i dont know anything about wehat tests bonnie++ has exactly) ok, ty for the reply. I just didnt understanded why i didnt get the same results. tsabi From owner-xfs@oss.sgi.com Fri Nov 30 01:39:06 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 30 Nov 2007 01:39:10 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from verein.lst.de (verein.lst.de [213.95.11.210]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAU9d2NH017791 for ; Fri, 30 Nov 2007 01:39:05 -0800 Received: from verein.lst.de (localhost [127.0.0.1]) by verein.lst.de (8.12.3/8.12.3/Debian-7.1) with ESMTP id lAU9d9F3003050 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for ; Fri, 30 Nov 2007 10:39:09 +0100 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.6) id lAU9d9XZ003048 for xfs@oss.sgi.com; Fri, 30 Nov 2007 10:39:09 +0100 Date: Fri, 30 Nov 2007 10:39:09 +0100 From: Christoph Hellwig To: xfs@oss.sgi.com Subject: [PATCH] dmapi: kill last 2.4 compat leftovers Message-ID: <20071130093909.GB2949@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-Scanned-By: MIMEDefang 2.39 X-Virus-Scanned: ClamAV 0.91.2/4956/Thu Nov 29 16:56:01 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13827 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: xfs Signed-off-by: Christoph Hellwig Index: linux-2.6-xfs/fs/dmapi/dmapi_register.c =================================================================== --- linux-2.6-xfs.orig/fs/dmapi/dmapi_register.c 2007-09-29 11:49:22.000000000 +0200 +++ linux-2.6-xfs/fs/dmapi/dmapi_register.c 2007-09-29 11:49:39.000000000 +0200 @@ -34,10 +34,8 @@ #include #include #include -#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,0) #include #include -#endif #include #include #include @@ -1520,33 +1518,6 @@ dm_getall_disp( return(error); } -#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0) -#define d_alloc_anon dmapi_alloc_anon -static struct dentry * -d_alloc_anon(struct inode *inode) -{ - struct dentry *dentry; - - spin_lock(&dcache_lock); - list_for_each_entry(dentry, &inode->i_dentry, d_alias) { - if (!(dentry->d_flags & DCACHE_NFSD_DISCONNECTED)) - goto found; - } - spin_unlock(&dcache_lock); - - dentry = d_alloc_root(inode); - if (likely(dentry != NULL)) - dentry->d_flags |= DCACHE_NFSD_DISCONNECTED; - return dentry; - found: - dget_locked(dentry); - dentry->d_vfs_flags |= DCACHE_REFERENCED; - spin_unlock(&dcache_lock); - iput(inode); - return dentry; -} -#endif - int dm_open_by_handle_rvp( unsigned int fd, Index: linux-2.6-xfs/fs/dmapi/dmapi_sysent.c =================================================================== --- linux-2.6-xfs.orig/fs/dmapi/dmapi_sysent.c 2007-09-29 11:49:42.000000000 +0200 +++ linux-2.6-xfs/fs/dmapi/dmapi_sysent.c 2007-09-29 11:49:57.000000000 +0200 @@ -706,14 +706,6 @@ dmapi_init_procfs(int dmapi_minor) entry = create_proc_read_entry( DMAPI_DBG_PROCFS "/summary", 0, NULL, dmapi_summary, NULL); entry->owner = THIS_MODULE; - -#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0) - entry = proc_mknod( DMAPI_PROCFS, S_IFCHR | S_IRUSR | S_IWUSR, - NULL, mk_kdev(MISC_MAJOR,dmapi_minor)); - if( entry == NULL ) - return; - entry->owner = THIS_MODULE; -#endif #endif } @@ -722,9 +714,6 @@ static void __exit dmapi_cleanup_procfs(void) { #ifdef CONFIG_PROC_FS -#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0) - remove_proc_entry( DMAPI_PROCFS, NULL); -#endif remove_proc_entry( DMAPI_DBG_PROCFS "/summary", NULL); remove_proc_entry( DMAPI_DBG_PROCFS "/fsreg", NULL); remove_proc_entry( DMAPI_DBG_PROCFS "/sessions", NULL); From owner-xfs@oss.sgi.com Fri Nov 30 01:38:35 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 30 Nov 2007 01:38:38 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from verein.lst.de (verein.lst.de [213.95.11.210]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAU9cV0K017691 for ; Fri, 30 Nov 2007 01:38:34 -0800 Received: from verein.lst.de (localhost [127.0.0.1]) by verein.lst.de (8.12.3/8.12.3/Debian-7.1) with ESMTP id lAU9caF3002996 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for ; Fri, 30 Nov 2007 10:38:37 +0100 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.6) id lAU9caaJ002994 for xfs@oss.sgi.com; Fri, 30 Nov 2007 10:38:36 +0100 Date: Fri, 30 Nov 2007 10:38:36 +0100 From: Christoph Hellwig To: xfs@oss.sgi.com Subject: [PATCH] xfs: kill last 2.4 ifdef leftover Message-ID: <20071130093836.GA2949@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-Scanned-By: MIMEDefang 2.39 X-Virus-Scanned: ClamAV 0.91.2/4956/Thu Nov 29 16:56:01 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13826 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: xfs Signed-off-by: Christoph Hellwig Index: linux-2.6-xfs/fs/xfs/xfs.h =================================================================== --- linux-2.6-xfs.orig/fs/xfs/xfs.h 2007-09-29 11:48:48.000000000 +0200 +++ linux-2.6-xfs/fs/xfs/xfs.h 2007-09-29 11:49:03.000000000 +0200 @@ -41,11 +41,6 @@ #define XFS_FILESTREAMS_TRACE 1 #endif -#include -#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,0) #include -#else -#include -#endif #endif /* __XFS_H__ */ From owner-xfs@oss.sgi.com Fri Nov 30 02:11:50 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 30 Nov 2007 02:11:53 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from verein.lst.de (verein.lst.de [213.95.11.210]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAUABlI9022366 for ; Fri, 30 Nov 2007 02:11:49 -0800 Received: from verein.lst.de (localhost [127.0.0.1]) by verein.lst.de (8.12.3/8.12.3/Debian-7.1) with ESMTP id lAUABrF3004217 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for ; Fri, 30 Nov 2007 11:11:53 +0100 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.6) id lAUABrNA004214 for xfs@oss.sgi.com; Fri, 30 Nov 2007 11:11:53 +0100 Date: Fri, 30 Nov 2007 11:11:53 +0100 From: Christoph Hellwig To: xfs@oss.sgi.com Subject: current cvs doesn't compile Message-ID: <20071130101153.GA4150@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-Scanned-By: MIMEDefang 2.39 X-Virus-Scanned: ClamAV 0.91.2/4956/Thu Nov 29 16:56:01 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13828 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: xfs xfs_mark_inode_dirty_sync is used in xfs_inode_item_format but isn't actually declared anywhere. Btw, we also still have the bug that fs/xfs/Kbuild exists as an empty files and without removing it nothing is recompiled at all. From owner-xfs@oss.sgi.com Fri Nov 30 02:18:21 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 30 Nov 2007 02:18:23 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from verein.lst.de (verein.lst.de [213.95.11.210]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAUAIICQ023664 for ; Fri, 30 Nov 2007 02:18:21 -0800 Received: from verein.lst.de (localhost [127.0.0.1]) by verein.lst.de (8.12.3/8.12.3/Debian-7.1) with ESMTP id lAUAIPF3004454 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for ; Fri, 30 Nov 2007 11:18:25 +0100 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.6) id lAUAIOV5004452 for xfs@oss.sgi.com; Fri, 30 Nov 2007 11:18:24 +0100 Date: Fri, 30 Nov 2007 11:18:24 +0100 From: Christoph Hellwig To: xfs@oss.sgi.com Subject: Re: current cvs doesn't compile Message-ID: <20071130101824.GA4419@lst.de> References: <20071130101153.GA4150@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071130101153.GA4150@lst.de> User-Agent: Mutt/1.3.28i X-Scanned-By: MIMEDefang 2.39 X-Virus-Scanned: ClamAV 0.91.2/4956/Thu Nov 29 16:56:01 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13829 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: xfs On Fri, Nov 30, 2007 at 11:11:53AM +0100, Christoph Hellwig wrote: > xfs_mark_inode_dirty_sync is used in xfs_inode_item_format but isn't > actually declared anywhere. Sorry, this was due to me having some old patch applied still. > Btw, we also still have the bug that fs/xfs/Kbuild exists as an empty > files and without removing it nothing is recompiled at all. But this is still a real issue.. From owner-xfs@oss.sgi.com Fri Nov 30 10:07:20 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 30 Nov 2007 10:07:23 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from sovereign.computergmbh.de (sovereign.computergmbh.de [85.214.69.204]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAUI7I0V029883 for ; Fri, 30 Nov 2007 10:07:19 -0800 Received: by sovereign.computergmbh.de (Postfix, from userid 25121) id 803461802CE28; Fri, 30 Nov 2007 19:07:26 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by sovereign.computergmbh.de (Postfix) with ESMTP id 752481C0596FB for ; Fri, 30 Nov 2007 19:07:26 +0100 (CET) Date: Fri, 30 Nov 2007 19:07:26 +0100 (CET) From: Jan Engelhardt To: xfs@oss.sgi.com Subject: ACL limit Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.91.2/4964/Fri Nov 30 08:24:45 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13830 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jengelh@computergmbh.de Precedence: bulk X-list: xfs Hi, is there any way to raise the number of ACLs that can be stored? The current limit of 25 is quite tight, where ext3 allows 124 and jfs 8192. Would increasing XFS_ACL_MAX_ENTRIES work (yes, using potentially more memory), i.e. not interfering with the on-disk format? thanks, Jan From owner-xfs@oss.sgi.com Fri Nov 30 14:31:56 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 30 Nov 2007 14:32:00 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from ninsei.hu (ninsei.hu [212.92.23.158]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with ESMTP id lAUMVrCO028627 for ; Fri, 30 Nov 2007 14:31:55 -0800 Received: from luba (lns-bzn-35-82-250-208-136.adsl.proxad.net [82.250.208.136]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by chatsubo.ninsei.hu (Postfix) with ESMTP id D62C07A32 for ; Fri, 30 Nov 2007 23:31:35 +0100 (CET) Received: by luba (Postfix, from userid 32266) id 1054B10A917; Fri, 30 Nov 2007 23:31:54 +0100 (CET) Date: Fri, 30 Nov 2007 23:31:54 +0100 From: KELEMEN Peter To: xfs@oss.sgi.com Subject: Re: 2.6.24-rc3 oopses while mounting fs Message-ID: <20071130223154.GB13589@luba> Mail-Followup-To: xfs@oss.sgi.com References: <20071128134523.GF7793@luba> <474E003A.7020000@sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <474E003A.7020000@sgi.com> Organization: CERN European Laboratory for Particle Physics, Switzerland X-GPG-KeyID: 1024D/9FF0CABE 2004-04-03 X-GPG-Fingerprint: 6C9E 5917 3B06 E4EE 6356 7BF0 8F3E CAB6 9FF0 CABE X-Comment: Personal opinion. Paragraphs might have been reformatted. X-Copyright: Forwarding or publishing without permission is prohibited. X-Accept-Language: hu,en User-Agent: Mutt/1.5.17 (2007-11-01) X-Virus-Scanned: ClamAV 0.91.2/4965/Fri Nov 30 08:59:13 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13831 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: Peter.Kelemen@cern.ch Precedence: bulk X-list: xfs * Lachlan Mcilroy (lachlan@sgi.com) [20071129 10:56]: Lachlan, > This looks like a problem we've just fixed, try this patch. > We'll get this to mainline soon. Thanks for the patch. Unfortunately, it does not seem to fix my problem. Attempting to mount the filesystem still results in an oops. Peter -----8<--- SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled SGI XFS Quota Management subsystem XFS mounting filesystem sdc Starting XFS recovery on filesystem: sdc (logdev: internal) Unable to handle kernel paging request at ffffc20001c9b000 RIP: [] :xfs:xlog_recover_add_to_trans+0x64/0xef PGD 15781d067 PUD 15781c067 PMD 130fd3067 PTE 0 Oops: 0000 [1] SMP CPU 1 Modules linked in: xfs sd_mod 3w_9xxx scsi_mod ohci_hcd uhci_hcd ehci_hcd ipv6 bnx2 sky2 r8169 ns838 20 dl2k acenic e100 tg3 e1000 mii Pid: 2431, comm: mount Not tainted 2.6.24-rc3-xfsbuf #2 RIP: 0010:[] [] :xfs:xlog_recover_add_to_trans+0x64/0xef RSP: 0018:ffff8101320d9728 EFLAGS: 00010286 RAX: ffffc20001c9c000 RBX: 000000000020bd78 RCX: 00000000001cc280 RDX: 0000000000000000 RSI: ffffc20001c9b000 RDI: ffffc20001cdbaf8 RBP: ffff8101320d9758 R08: ffff810157801652 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000002 R12: ffff8101324633b8 R13: ffffc20001c5b508 R14: ffff810132463110 R15: ffffc20001c5b4fc FS: 00002abba6e87b00(0000) GS:ffff810157801708(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffc20001c9b000 CR3: 000000013252a000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process mount (pid: 2431, threadinfo ffff8101320d8000, task ffff810131972000) Stack: 0020bd785769c5e8 000000005769c5e8 000000000000000a ffffc20001c5b508 000000000000011e ffffc20001c5b4fc ffff8101320d97b8 ffffffff881c79db ffffc20001c6a000 00000001881da0bc ffff81013188f000 ffff8101320d9828 Call Trace: [] :xfs:xlog_recover_process_data+0x184/0x1d9 [] :xfs:xlog_do_recovery_pass+0x74b/0x795 [] :xfs:xlog_do_log_recovery+0x3d/0x82 [] :xfs:xlog_do_recover+0x12/0x11c [] :xfs:xlog_recover+0x84/0x92 [] :xfs:xfs_log_mount+0x8c/0xe4 [] :xfs:xfs_mountfs+0x67d/0x97b [] :xfs:xfs_mru_cache_create+0x170/0x1d5 [] :xfs:xfs_fstrm_free_func+0x0/0x81 [] :xfs:xfs_ioinit+0xb/0xd [] :xfs:xfs_mount+0x2bb/0x36b [] :xfs:xfs_fs_fill_super+0xd2/0x245 [] get_filesystem+0x17/0x39 [] sget+0x3fb/0x418 [] sget+0x403/0x418 [] set_bdev_super+0x0/0x14 [] get_sb_bdev+0x123/0x16f [] :xfs:xfs_fs_fill_super+0x0/0x245 [] :xfs:xfs_fs_get_sb+0x13/0x18 [] vfs_kern_mount+0x8f/0x11c [] do_kern_mount+0x44/0xf4 [] do_mount+0x6d8/0x71e [] __up_read+0x93/0x9b [] up_read+0x23/0x27 [] do_page_fault+0x42e/0x7c7 [] zone_statistics+0x64/0x69 [] __alloc_pages+0x6b/0x311 [] sys_mount+0x8a/0xcc [] system_call+0x7e/0x83 Code: f3 a4 49 89 c7 49 8b 54 24 08 8b 42 18 85 c0 74 0e 3b 42 14 -- .+'''+. .+'''+. .+'''+. .+'''+. .+'' Kelemen Péter / \ / \ Peter.Kelemen@cern.ch .+' `+...+' `+...+' `+...+' `+...+' From owner-xfs@oss.sgi.com Fri Nov 30 15:03:06 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 30 Nov 2007 15:03:10 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from mail.integraonline.com (relay4.integra.net [204.130.255.183]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAUN3599031854 for ; Fri, 30 Nov 2007 15:03:06 -0800 Received: (qmail 331 invoked from network); 30 Nov 2007 22:36:35 -0000 Received: from unknown (HELO ?192.168.1.107?) (76.164.13.124) by 0 with SMTP; 30 Nov 2007 22:36:35 -0000 In-Reply-To: <474FBA21.4070201@sgi.com> References: <20071114070400.GA25708@puku.stupidest.org> <20071125163014.GA17922@infradead.org> <474FBA21.4070201@sgi.com> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <165B249C-FE97-4B27-927B-B39DE316CB23@xfs.org> Cc: Christoph Hellwig , Chris Wedgwood , linux-xfs@oss.sgi.com, LKML Content-Transfer-Encoding: 7bit From: Stephen Lord Subject: Re: [PATCH] xfs: revert to double-buffering readdir Date: Fri, 30 Nov 2007 16:36:25 -0600 To: Timothy Shimmin X-Mailer: Apple Mail (2.752.3) X-Virus-Scanned: ClamAV 0.91.2/4965/Fri Nov 30 08:59:13 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13832 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lord@xfs.org Precedence: bulk X-list: xfs Wow, was it really that long ago! Looks like the readdir is in the bowels of the btree code when filldir gets called here, there are probably locks on several buffers in the btree at this point. This will only show up for large directories I bet. The xfs readdir code has the complete xfs inode number in its hands at this point (filldir is not necessarily getting all the bits of it). All we are doing the lookup for really is to get the inode number back again so we can get the inode and get the attributes. Rather dumb really. There has got to be a way of doing a callout structure here so that the inode number can be pushed through filldir and back into an fs specific call. The fs then can do a lookup by id - which is what it does most of the time for resolving nfs handles anyway. Should be more efficient than the current scheme. Just rambling, not a single line of code was consulted in writing this message. You want to make a big fat btree directory for testing this stuff. Make sure it gets at least a couple of layers of node blocks. Steve On Nov 30, 2007, at 1:22 AM, Timothy Shimmin wrote: > Christoph Hellwig wrote: >> The current readdir implementation deadlocks on a btree buffers locks >> because nfsd calls back into ->lookup from the filldir callback. The >> only short-term fix for this is to revert to the old inefficient >> double-buffering scheme. > > Probably why Steve did this: :) > > xfs_file.c > ---------------------------- > revision 1.40 > date: 2001/03/15 23:33:20; author: lord; state: Exp; lines: +54 -17 > modid: 2.4.x-xfs:slinx:90125a > Change linvfs_readdir to allocate a buffer, call xfs to fill it, and > then call the filldir function on each entry. This is instead of > doing the > filldir deep in the bowels of xfs which causes locking problems. > ---------------------------- > From owner-xfs@oss.sgi.com Fri Nov 30 15:04:30 2007 Received: with ECARTIS (v1.0.0; list xfs); Fri, 30 Nov 2007 15:04:34 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.0-r574664 (2007-09-11) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-r574664 Received: from smtp116.sbc.mail.sp1.yahoo.com (smtp116.sbc.mail.sp1.yahoo.com [69.147.64.89]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id lAUN4SUX032208 for ; Fri, 30 Nov 2007 15:04:30 -0800 Received: (qmail 53267 invoked from network); 30 Nov 2007 23:04:37 -0000 Received: from unknown (HELO stupidest.org) (cwedgwood@sbcglobal.net@24.5.75.45 with login) by smtp116.sbc.mail.sp1.yahoo.com with SMTP; 30 Nov 2007 23:04:37 -0000 X-YMail-OSG: 4eG2ZXYVM1kE19Y3nwScO.Ibi6Jz3ydL.xqwLH3vvwlLrjWeo_ho9PfOMMwGwA3fkjqpaSv9Gg-- Received: by tuatara.stupidest.org (Postfix, from userid 10000) id AF7582824F4C; Fri, 30 Nov 2007 15:04:35 -0800 (PST) Date: Fri, 30 Nov 2007 15:04:35 -0800 From: Chris Wedgwood To: Stephen Lord Cc: Timothy Shimmin , Christoph Hellwig , linux-xfs@oss.sgi.com, LKML Subject: Re: [PATCH] xfs: revert to double-buffering readdir Message-ID: <20071130230435.GA12626@puku.stupidest.org> References: <20071114070400.GA25708@puku.stupidest.org> <20071125163014.GA17922@infradead.org> <474FBA21.4070201@sgi.com> <165B249C-FE97-4B27-927B-B39DE316CB23@xfs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <165B249C-FE97-4B27-927B-B39DE316CB23@xfs.org> X-Virus-Scanned: ClamAV 0.91.2/4965/Fri Nov 30 08:59:13 2007 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 13833 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: cw@f00f.org Precedence: bulk X-list: xfs On Fri, Nov 30, 2007 at 04:36:25PM -0600, Stephen Lord wrote: > Looks like the readdir is in the bowels of the btree code when > filldir gets called here, there are probably locks on several > buffers in the btree at this point. This will only show up for large > directories I bet. I see it for fairly small directories. Larger than what you can stuff into an inode but less than a block (I'm not checking but fairly sure that's the case). > Just rambling, not a single line of code was consulted in writing > this message. Can you explain why the offset is capped and treated in an 'odd way' at all? + curr_offset = filp->f_pos; + if (curr_offset == 0x7fffffff) + offset = 0xffffffff; + else + offset = filp->f_pos; and later the offset to filldir is masked. Is that some restriction in filldir?