From owner-xfs@oss.sgi.com Thu Aug 31 23:55:53 2006 Received: with ECARTIS (v1.0.0; list xfs); Thu, 31 Aug 2006 23:56:21 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id k816tcDW022684 for ; Thu, 31 Aug 2006 23:55:50 -0700 Received: from wobbly.melbourne.sgi.com (wobbly.melbourne.sgi.com [134.14.55.135]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA14969; Fri, 1 Sep 2006 16:54:54 +1000 Received: from wobbly.melbourne.sgi.com (localhost [127.0.0.1]) by wobbly.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id k816sqgw3207623; Fri, 1 Sep 2006 16:54:53 +1000 (EST) Received: (from nathans@localhost) by wobbly.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id k816spc93245563; Fri, 1 Sep 2006 16:54:51 +1000 (EST) Date: Fri, 1 Sep 2006 16:54:50 +1000 From: Nathan Scott To: dgc@melbourne.sgi.com Cc: xfs@oss.sgi.com Subject: review: add a splice command to xfs_io Message-ID: <20060901165450.T3186664@wobbly.melbourne.sgi.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="1LKvkjL3sHcu1TtY" Content-Disposition: inline User-Agent: Mutt/1.2.5i X-archive-position: 8859 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: nathans@sgi.com Precedence: bulk X-list: xfs --1LKvkjL3sHcu1TtY Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Yo Dave, Here's some code which should help exercising the splice functionality from a QA test. Needs a /usr/include/sys/splice.h (attached also, but glibc will get this at some point I guess). No QA test yet though - I will hack something up there on Monday (its beer o'clock). cheers. -- Nathan Index: xfsprogs/io/init.c =================================================================== --- xfsprogs.orig/io/init.c 2006-09-01 09:34:40.679409500 +1000 +++ xfsprogs/io/init.c 2006-09-01 09:36:18.193313250 +1000 @@ -74,6 +74,7 @@ init_commands(void) resblks_init(); sendfile_init(); shutdown_init(); + splice_init(); truncate_init(); } Index: xfsprogs/io/io.h =================================================================== --- xfsprogs.orig/io/io.h 2006-09-01 09:34:40.595404250 +1000 +++ xfsprogs/io/io.h 2006-09-01 09:36:04.013341500 +1000 @@ -131,6 +131,12 @@ extern void sendfile_init(void); #define sendfile_init() do { } while (0) #endif +#ifdef HAVE_SPLICE +extern void splice_init(void); +#else +#define splice_init() do { } while (0) +#endif + #ifdef HAVE_MADVISE extern void madvise_init(void); #else Index: xfsprogs/io/splice.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ xfsprogs/io/splice.c 2006-09-01 16:46:41.929606500 +1000 @@ -0,0 +1,255 @@ +/* + * Copyright (c) 2006 Silicon Graphics, Inc. + * All Rights Reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it would be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write the Free Software Foundation, + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include +#include +#include +#include +#include "init.h" +#include "io.h" + +static cmdinfo_t splice_cmd; + +static void +splice_help(void) +{ + printf(_( +"\n" +" splice part or all of the current open file to a second file\n" +"\n" +" Example:\n" +" 'splice 100m 400m 4k out' - splices 4 kilobytes at 100 megabytes offset\n" +" into the open file, to 200m offset in \"out\"\n" +"\n" +" Copies data between one file descriptor and another. Because this copying\n" +" is done within the kernel, splice does not need to transfer data to and\n" +" from user space.\n" +" -i -- input file, instead of current file descriptor\n" +" -m -- move pages instead of copying\n" +" -n -- operate in non-blocking I/O mode\n" +" Offsets in both the source and target files are optional, as is the size,\n" +" as follows: If one of these arguments is given it's the transfer size. If\n" +" two are given they are offsets. If three are given - offset, offset, size.\n" +" Finally, if none of these arguments are given the entire input file will be\n" +" spliced to the output file.\n" +"\n")); +} + +static int +splice_buffer( + int in_fd, + off64_t in_offset, + int out_fd, + off64_t out_offset, + size_t bsize, + int flags, + long long *total) +{ + long long bytes_remaining = *total; + ssize_t bytes, ibytes, obytes; + int pipefds[2]; + int ops = 0; + + *total = 0; + if (pipe(pipefds) < 0) { + perror("pipe"); + return -1; + } + while (bytes_remaining > 0) { + ibytes = min(bytes_remaining, bsize); + bytes = splice(in_fd, &in_offset, + pipefds[1], NULL, ibytes, flags); + if (bytes == 0) + break; + if (bytes < 0) { + perror("splice(in)"); + return -1; + } + ops++; + ibytes = obytes = bytes; + while (obytes > 0) { + bytes = splice(pipefds[0], NULL, + out_fd, &out_offset, obytes, flags); + if (bytes < 0) { + perror("splice(out)"); + return -1; + } + ops++; + obytes -= bytes; + } + *total += ibytes; + if (ibytes >= bytes_remaining) + break; + bytes_remaining -= ibytes; + } + return ops; +} + +static int +splice_f( + int argc, + char **argv) +{ + off64_t inoff, outoff; + long long count, total; + size_t blocksize, sectsize; + struct stat64 instat; + struct timeval t1, t2; + char s1[64], s2[64], ts[64]; + char *infile = NULL; + int Cflag, qflag, flags; + int c, infd = -1, outfd = -1; + + Cflag = qflag = flags = 0; + init_cvtnum(&blocksize, §size); + while ((c = getopt(argc, argv, "i:mnqC")) != EOF) { + switch (c) { + case 'C': + Cflag = 1; + break; + case 'q': + qflag = 1; + break; + case 'm': + flags |= SPLICE_F_MOVE; + break; + case 'n': + flags |= SPLICE_F_NONBLOCK; + break; + case 'i': + infile = optarg; + break; + default: + return command_usage(&splice_cmd); + } + } + + /* + * If no more arguments are given, splice the whole input file + * If one of these arguments is given it's the transfer size + * If two are given they are file offsets + * If three are given - offset, offset, size + */ + + if (optind >= argc) + return command_usage(&splice_cmd); + + if (optind == argc - 1) { + inoff = outoff = 0; + count = -1; + } else if (optind == argc - 2) { + inoff = outoff = 0; + count = cvtnum(blocksize, sectsize, argv[optind]); + if (count < 0) { + printf(_("non-numeric length argument -- %s\n"), + argv[optind]); + return 0; + } + optind++; + } else if (optind == argc - 3 || optind == argc - 4) { + inoff = cvtnum(blocksize, sectsize, argv[optind]); + if (inoff < 0) { + printf(_("non-numeric offset argument -- %s\n"), + argv[optind]); + return 0; + } + optind++; + outoff = cvtnum(blocksize, sectsize, argv[optind]); + if (outoff < 0) { + printf(_("non-numeric offset argument -- %s\n"), + argv[optind]); + return 0; + } + optind++; + if (optind == argc - 1) { + count = -1; + } else { + count = cvtnum(blocksize, sectsize, argv[optind]); + if (count < 0) { + printf(_("non-numeric length argument -- %s\n"), + argv[optind]); + return 0; + } + optind++; + } + } else + return command_usage(&splice_cmd); + + if (((outfd = openfile(argv[optind], NULL, IO_CREAT, 0644)) < 0)) + return 0; + + if (!infile) + infd = file->fd; + else if ((infd = openfile(infile, NULL, IO_READONLY, 0)) < 0) + goto done; + + if (fstat64(infd, &instat) < 0) { + perror("fstat64"); + goto done; + } + if (count == -1) + count = instat.st_size; + total = count; + blocksize = instat.st_blksize; + + gettimeofday(&t1, NULL); + c = splice_buffer(infd, inoff, outfd, outoff, blocksize, flags, &total); + if (c < 0) + goto done; + if (qflag) + goto done; + gettimeofday(&t2, NULL); + t2 = tsub(t2, t1); + + /* Finally, report back -- -C gives a parsable format */ + timestr(&t2, ts, sizeof(ts), Cflag ? VERBOSE_FIXED_TIME : 0); + if (!Cflag) { + cvtstr((double)total, s1, sizeof(s1)); + cvtstr(tdiv((double)total, t2), s2, sizeof(s2)); + printf(_("spliced %lld/%lld bytes from offset %lld to offset %lld\n"), + total, count, (long long)inoff, (long long)outoff); + printf(_("%s, %d ops; %s (%s/sec and %.4f ops/sec)\n"), + s1, c, ts, s2, tdiv((double)c, t2)); + } else {/* bytes,ops,time,bytes/sec,ops/sec */ + printf("%lld,%d,%s,%.3f,%.3f\n", + total, c, ts, + tdiv((double)total, t2), tdiv((double)c, t2)); + } +done: + if (infile) + close(infd); + close(outfd); + return 0; +} + +void +splice_init(void) +{ + splice_cmd.name = _("splice"); + splice_cmd.cfunc = splice_f; + splice_cmd.argmin = 1; + splice_cmd.argmax = -1; + splice_cmd.flags = CMD_NOMAP_OK | CMD_FOREIGN_OK; + splice_cmd.args = + _("[-i infile] [inoff [outoff [len]]] outfile"); + splice_cmd.oneline = + _("Splice copy data between two file descriptors via a pipe"); + splice_cmd.help = splice_help; + + add_command(&splice_cmd); +} Index: xfsprogs/io/Makefile =================================================================== --- xfsprogs.orig/io/Makefile 2006-09-01 09:32:31.319325000 +1000 +++ xfsprogs/io/Makefile 2006-09-01 12:03:24.598944250 +1000 @@ -44,6 +44,13 @@ else LSRCFILES += sendfile.c endif +ifeq ($(HAVE_SPLICE),yes) +CFILES += splice.c +LCFLAGS += -DHAVE_SPLICE +else +LSRCFILES += splice.c +endif + ifeq ($(PKG_PLATFORM),irix) LSRCFILES += inject.c resblks.c else Index: xfsprogs/aclocal.m4 =================================================================== --- xfsprogs.orig/aclocal.m4 2006-09-01 12:04:54.696575000 +1000 +++ xfsprogs/aclocal.m4 2006-09-01 12:06:48.043658750 +1000 @@ -157,9 +157,9 @@ AC_DEFUN([AC_PACKAGE_GLOBALS], AC_SUBST(pkg_platform) ]) -# +# # Check if we have a working fadvise system call -# +# AC_DEFUN([AC_HAVE_FADVISE], [ AC_MSG_CHECKING([for fadvise ]) AC_TRY_COMPILE([ @@ -174,9 +174,9 @@ AC_DEFUN([AC_HAVE_FADVISE], AC_SUBST(have_fadvise) ]) -# +# # Check if we have a working madvise system call -# +# AC_DEFUN([AC_HAVE_MADVISE], [ AC_MSG_CHECKING([for madvise ]) AC_TRY_COMPILE([ @@ -191,9 +191,9 @@ AC_DEFUN([AC_HAVE_MADVISE], AC_SUBST(have_madvise) ]) -# +# # Check if we have a working mincore system call -# +# AC_DEFUN([AC_HAVE_MINCORE], [ AC_MSG_CHECKING([for mincore ]) AC_TRY_COMPILE([ @@ -208,9 +208,9 @@ AC_DEFUN([AC_HAVE_MINCORE], AC_SUBST(have_mincore) ]) -# +# # Check if we have a working sendfile system call -# +# AC_DEFUN([AC_HAVE_SENDFILE], [ AC_MSG_CHECKING([for sendfile ]) AC_TRY_COMPILE([ @@ -226,6 +226,23 @@ AC_DEFUN([AC_HAVE_SENDFILE], ]) # +# Check if we have a working splice system call +# +AC_DEFUN([AC_HAVE_SPLICE], + [ AC_MSG_CHECKING([for splice ]) + AC_TRY_COMPILE([ +#define _GNU_SOURCE +#define _FILE_OFFSET_BITS 64 +#include + ], [ + splice(0, 0, 0, 0, 0, 0); + ], have_splice=yes + AC_MSG_RESULT(yes), + AC_MSG_RESULT(no)) + AC_SUBST(have_splice) + ]) + +# # Check if we have a getmntent libc call (IRIX, Linux) # AC_DEFUN([AC_HAVE_GETMNTENT], Index: xfsprogs/configure.in =================================================================== --- xfsprogs.orig/configure.in 2006-09-01 12:04:54.612569750 +1000 +++ xfsprogs/configure.in 2006-09-01 12:06:39.123101250 +1000 @@ -52,6 +52,7 @@ AC_HAVE_FADVISE AC_HAVE_MADVISE AC_HAVE_MINCORE AC_HAVE_SENDFILE +AC_HAVE_SPLICE AC_HAVE_GETMNTENT AC_HAVE_GETMNTINFO Index: xfsprogs/m4/package_libcdev.m4 =================================================================== --- xfsprogs.orig/m4/package_libcdev.m4 2006-09-01 12:05:03.605131750 +1000 +++ xfsprogs/m4/package_libcdev.m4 2006-09-01 12:06:29.186480250 +1000 @@ -1,6 +1,6 @@ -# +# # Check if we have a working fadvise system call -# +# AC_DEFUN([AC_HAVE_FADVISE], [ AC_MSG_CHECKING([for fadvise ]) AC_TRY_COMPILE([ @@ -15,9 +15,9 @@ AC_DEFUN([AC_HAVE_FADVISE], AC_SUBST(have_fadvise) ]) -# +# # Check if we have a working madvise system call -# +# AC_DEFUN([AC_HAVE_MADVISE], [ AC_MSG_CHECKING([for madvise ]) AC_TRY_COMPILE([ @@ -32,9 +32,9 @@ AC_DEFUN([AC_HAVE_MADVISE], AC_SUBST(have_madvise) ]) -# +# # Check if we have a working mincore system call -# +# AC_DEFUN([AC_HAVE_MINCORE], [ AC_MSG_CHECKING([for mincore ]) AC_TRY_COMPILE([ @@ -49,9 +49,9 @@ AC_DEFUN([AC_HAVE_MINCORE], AC_SUBST(have_mincore) ]) -# +# # Check if we have a working sendfile system call -# +# AC_DEFUN([AC_HAVE_SENDFILE], [ AC_MSG_CHECKING([for sendfile ]) AC_TRY_COMPILE([ @@ -67,6 +67,23 @@ AC_DEFUN([AC_HAVE_SENDFILE], ]) # +# Check if we have a working splice system call +# +AC_DEFUN([AC_HAVE_SPLICE], + [ AC_MSG_CHECKING([for splice ]) + AC_TRY_COMPILE([ +#define _GNU_SOURCE +#define _FILE_OFFSET_BITS 64 +#include + ], [ + splice(0, 0, 0, 0, 0, 0); + ], have_splice=yes + AC_MSG_RESULT(yes), + AC_MSG_RESULT(no)) + AC_SUBST(have_splice) + ]) + +# # Check if we have a getmntent libc call (IRIX, Linux) # AC_DEFUN([AC_HAVE_GETMNTENT], Index: xfsprogs/include/builddefs.in =================================================================== --- xfsprogs.orig/include/builddefs.in 2006-09-01 14:58:42.038205250 +1000 +++ xfsprogs/include/builddefs.in 2006-09-01 14:59:00.475357500 +1000 @@ -90,6 +90,7 @@ HAVE_FADVISE = @have_fadvise@ HAVE_MADVISE = @have_madvise@ HAVE_MINCORE = @have_mincore@ HAVE_SENDFILE = @have_sendfile@ +HAVE_SPLICE = @have_splice@ HAVE_GETMNTENT = @have_getmntent@ HAVE_GETMNTINFO = @have_getmntinfo@ --1LKvkjL3sHcu1TtY Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="splice.h" #ifndef SPLICE_H #define SPLICE_H #include #include #include #include #if defined(__i386__) #define __NR_sys_splice 313 #define __NR_sys_tee 315 #define __NR_sys_vmsplice 316 #elif defined(__x86_64__) #define __NR_sys_splice 275 #define __NR_sys_tee 276 #define __NR_sys_vmsplice 278 #elif defined(__powerpc__) || defined(__powerpc64__) #define __NR_sys_splice 283 #define __NR_sys_tee 284 #define __NR_sys_vmsplice 285 #elif defined(__ia64__) #define __NR_sys_splice 1297 #define __NR_sys_tee 1301 #define __NR_sys_vmsplice 1302 #else #error unsupported arch #endif #define SPLICE_F_MOVE (0x01) /* move pages instead of copying */ #define SPLICE_F_NONBLOCK (0x02) /* don't block on the pipe splicing (but */ /* we may still block on the fd we splice */ /* from/to, of course */ #define SPLICE_F_MORE (0x04) /* expect more data */ #define SPLICE_F_GIFT (0x08) /* pages passed in are a gift */ _syscall6(int, sys_splice, int, fdin, loff_t *, off_in, int, fdout, loff_t *, off_out, size_t, len, unsigned int, flags); _syscall4(int, sys_vmsplice, int, fd, const struct iovec *, iov, unsigned long, nr_segs, unsigned int, flags); _syscall4(int, sys_tee, int, fdin, int, fdout, size_t, len, unsigned int, flags); static inline int splice(int fdin, loff_t *off_in, int fdout, loff_t *off_out, size_t len, unsigned long flags) { return sys_splice(fdin, off_in, fdout, off_out, len, flags); } static inline int tee(int fdin, int fdout, size_t len, unsigned int flags) { return sys_tee(fdin, fdout, len, flags); } static inline int vmsplice(int fd, const struct iovec *iov, unsigned long nr_segs, unsigned int flags) { return sys_vmsplice(fd, iov, nr_segs, flags); } #endif --1LKvkjL3sHcu1TtY-- From owner-xfs@oss.sgi.com Fri Sep 1 00:38:02 2006 Received: with ECARTIS (v1.0.0; list xfs); Fri, 01 Sep 2006 00:38:10 -0700 (PDT) Received: from deliver.uni-koblenz.de (deliver.uni-koblenz.de [141.26.64.15]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id k817c1DW002417 for ; Fri, 1 Sep 2006 00:38:02 -0700 Received: from localhost (localhost [127.0.0.1]) by deliver.uni-koblenz.de (Postfix) with ESMTP id 686E5B62227; Fri, 1 Sep 2006 08:36:39 +0200 (CEST) Received: from deliver.uni-koblenz.de ([127.0.0.1]) by localhost (deliver [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 28984-01; Fri, 1 Sep 2006 08:36:37 +0200 (CEST) Received: from bliss.uni-koblenz.de (bliss.uni-koblenz.de [141.26.64.65]) (using SSLv3 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by deliver.uni-koblenz.de (Postfix) with ESMTP id 98766B55F3E; Fri, 1 Sep 2006 08:36:37 +0200 (CEST) From: Rainer Krienke To: Chris Hane , xfs@oss.sgi.com Subject: Re: XFS and 3.2TB Partition Date: Fri, 1 Sep 2006 08:36:28 +0200 User-Agent: KMail/1.9.4 References: <44F714F2.7050502@gmail.com> In-Reply-To: <44F714F2.7050502@gmail.com> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart1420533.3OVRndO7eY"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200609010836.32331.krienke@uni-koblenz.de> X-archive-position: 8860 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: krienke@uni-koblenz.de Precedence: bulk X-list: xfs --nextPart1420533.3OVRndO7eY Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Am Donnerstag, 31. August 2006 18:57 schrieben Sie: > I am trying to create a 3.2TB partition on my Raid 5. Is there a > document that could help? > > I have a 3ware 9500 controller and 8 *500GB sata drives configured into > a single RAID 5 array. > I have a Raid with about 5TB and no problems creating an xfs filesystem on = it.=20 The system is Novell SLES10 with a 2.6.16.21 kernel.=20 At first there was a problem with the raid. The firmware of the raid device= =20 needed an upgrade. Bevore the upgrade I had a maximum of 2TB.=20 In dmesg (or /var/log/boot.msg on SLES10) you should see something like thi= s=20 message if the device (sdc here) is handled correctly: <5>sdc : very big device. try to use READ CAPACITY(16). <5>SCSI device sdc: 10156243968 512-byte hdwr sectors (5199997 MB) <5>sdc: Write Protect is off <7>sdc: Mode Sense: cb 00 00 08 <5>SCSI device sdc: drive cache: write back <5>sdc : very big device. try to use READ CAPACITY(16). <5>SCSI device sdc: 10156243968 512-byte hdwr sectors (5199997 MB) Bevore the firmware update there was an error when trying to read the capac= ity=20 via READ CAPACITY(16).=20=20 I created the partitiions using parted. fdisk did not work. Have a nice day Rainer --=20 --------------------------------------------------------------------------- Rainer Krienke, Universitaet Koblenz, Rechenzentrum, Raum A022 Universitaetsstrasse 1, 56070 Koblenz, Tel: +49 261287 -1312, Fax: -1001312 Mail: krienke@uni-koblenz.de, Web: http://www.uni-koblenz.de/~krienke Get my public PGP key: http://www.uni-koblenz.de/~krienke/mypgp.html --------------------------------------------------------------------------- --nextPart1420533.3OVRndO7eY Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) iD8DBQBE99Twaldtjc/KDEoRAoshAKCx63Wqwb/s188EdqmGXyLJE79mRgCfbEpN EoR1IWQ8ogyx+D6zmgoccag= =0RXB -----END PGP SIGNATURE----- --nextPart1420533.3OVRndO7eY-- From owner-xfs@oss.sgi.com Fri Sep 1 03:41:32 2006 Received: with ECARTIS (v1.0.0; list xfs); Fri, 01 Sep 2006 03:41:41 -0700 (PDT) Received: from jabber.dneg.com (mail.dneg.com [193.203.82.196]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id k81AfVDW030184 for ; Fri, 1 Sep 2006 03:41:32 -0700 Received: from localhost (localhost.localdomain [127.0.0.1]) by jabber.dneg.com (Postfix) with ESMTP id AB4F1B7000; Fri, 1 Sep 2006 09:41:07 +0100 (BST) Received: from jabber.dneg.com ([127.0.0.1]) by localhost (jabber [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 13808-06; Fri, 1 Sep 2006 09:40:58 +0100 (BST) Received: from [172.16.11.100] (bath.dneg.com [172.16.11.100]) by jabber.dneg.com (Postfix) with ESMTP id 71F3AB6FD4; Fri, 1 Sep 2006 09:40:57 +0100 (BST) Message-ID: <44F7F219.40904@dneg.com> Date: Fri, 01 Sep 2006 09:40:57 +0100 From: Evan Fraser User-Agent: Thunderbird 1.5.0.5 (X11/20060719) MIME-Version: 1.0 To: Chris Hane Cc: xfs@oss.sgi.com Subject: Re: XFS and 3.2TB Partition References: <44F714F2.7050502@gmail.com> In-Reply-To: <44F714F2.7050502@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 8861 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: evan@dneg.com Precedence: bulk X-list: xfs I had that problem when I was using an adaptec aic79xx adapter, using a new LSI-Logic one fixed the problem for me. Could it be a limit with your controller/controller driver? Chris Hane wrote: > > I am trying to create a 3.2TB partition on my Raid 5. Is there a > document that could help? > > I have a 3ware 9500 controller and 8 *500GB sata drives configured > into a single RAID 5 array. > > I am running linux 2.6.16 with the 3ware drivers compiled into the > kernel. > > I've tried a couple of different means to create the partition and > format the file system with xfs without success (or confidence that I > haven't done something wrong). > > 1. FDISK > > I've tried fdisk on the array to create the partition; but it forces > me enter the number of cylinders before letting me create the > partition. I enter the largest number of cylinders since I'm not sure > how to calculate the correct cylinder number across an 8 disk RAID 5 > array. > > I then create the partition starting at 0 (or whatever the default > was) and ending at 3500GB. > > Once the partition is created this way, I can mkfs.xfs; but I'm a > little hesitant to use this since I input and arbitrary cylinder number. > > Thoughts on what to use for the correct cylinder count with fdisk? > > 2. PARTED > > I've tried to use parted without any success. Here is what I've tried > and the errors I get. > > > parted > parted> mklabel gpt > parted> mkpart primary 0 3500GB > parted> quit > > ok - the partition now exists. If I use ext2 everything works ok. > > however, when I run > > > mkfs.xfs /dev/sda1 > > the file system is formated but is truncated to to 2TB. > > > Any advice/pointers on how to partition and format a 3.2TB raid 5 > array would be much appreciated. > > Thanks, > Chris.... > > > -- evan@dneg.com Linux Systems Administrator Double Negative tel: +44 (0)20 7534 4400 fax: +44 (0)20 7534 4452 77 shaftesbury avenue, w1d 5du, London From owner-xfs@oss.sgi.com Fri Sep 1 06:20:31 2006 Received: with ECARTIS (v1.0.0; list xfs); Fri, 01 Sep 2006 06:20:49 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id k81DKDDW024104 for ; Fri, 1 Sep 2006 06:20:27 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id XAA22324; Fri, 1 Sep 2006 23:19:22 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id k81DJIeQ7692301; Fri, 1 Sep 2006 23:19:19 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id k81DJD988721962; Fri, 1 Sep 2006 23:19:13 +1000 (AEST) Date: Fri, 1 Sep 2006 23:19:13 +1000 From: David Chinner To: Jens Axboe Cc: "Jeffrey E. Hundstad" , xfs@oss.sgi.com, nathans@sgi.com Subject: Re: vmsplice can't work well Message-ID: <20060901131913.GG5737019@melbourne.sgi.com> References: <44F4440F.1090300@gmail.com> <20060829140542.GN12257@kernel.dk> <44F5CC08.8010205@mnsu.edu> <20060830174815.GF7331@kernel.dk> <44F5D3C6.1010108@mnsu.edu> <20060831092440.GC5528@kernel.dk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20060831092440.GC5528@kernel.dk> User-Agent: Mutt/1.4.2.1i X-archive-position: 8862 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs On Thu, Aug 31, 2006 at 11:24:41AM +0200, Jens Axboe wrote: > XFS list, > > On Wed, Aug 30 2006, Jeffrey E. Hundstad wrote: > > Jens Axboe wrote: > > >On Wed, Aug 30 2006, Jeffrey E. Hundstad wrote: > > > > > >>I tried your splie-git...tar.gz file and tried the splice-cp. It > > >>produced files that are the right length... but the files only contain > > >>nulls. Here's the straces: > > >> > > > > > >Works for me as well. Could be an fs issue, how large was the README and > > >what filesystem did you use? > > > > > > > > The file was 1130 bytes (it was the README in that directory.) The > > filesystem is XFS. > > > > I can reproduce this quite easily, doing: > > nelson:~ # splice-cp sda.blktrace.0 foo > > nelson:~ # md5sum sda.blktrace.0 foo > 4754070ae77091468c830ea23b125d68 sda.blktrace.0 > efdc7b9d00692fdfe91a691277209267 foo Busted write side - splice-in works fine, splice-out is an alias for /dev/zero. The reason it's full of NULLs: death:/mnt# xfs_bmap -vv foo foo: no extents death:/mnt# It's a hole. Nothing has been flushed out to disk. Interesting - the inode is leaving pipe_to_file() dirty, the page is dirty, the buffer head is dirty, delay, mapped and uptodate. The page is the only page in the radix tree and the radix tree is marked dirty. But it never gets flushed out. Even when I use dd to seek past the first disk block and write further into the file, I still end up with a hole in the range where the original splice write should be which means it was no longer in the page cache. Copying a large file I can see dirty memory increase to tens of megabytes. Nothing is going to disk, writeback is not going above zero. Interestingly, when the write completes, the size of the page cache drops by almost exactly the size of the file being written - almost like a truncate_inode_pages() is occuring on file close. Oh, look - we _are_ tossing away all the pages on close. xfs_splice_write() hasn't updated the xfs inode size when extending the file. The linux inode has the correct value, but xfs thinks that it's only got a speculative allocation EOF (i.e. 0) so we invalidate it before it gets to disk. The patch below just copies some code out of xfs_write() where it updates the xfs inode size and drops it in xfs_splice_write(). It's almost certainly not the right fix, but the bucket under the pipe will now catch most of the bits.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group --- fs/xfs/linux-2.6/xfs_lrw.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_lrw.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_lrw.c 2006-08-31 16:17:47.000000000 +1000 +++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_lrw.c 2006-09-01 22:48:56.463190730 +1000 @@ -390,6 +390,8 @@ xfs_splice_write( xfs_inode_t *ip = XFS_BHVTOI(bdp); xfs_mount_t *mp = ip->i_mount; ssize_t ret; + struct inode *inode = outfilp->f_mapping->host; + xfs_fsize_t isize; XFS_STATS_INC(xs_write_calls); if (XFS_FORCED_SHUTDOWN(ip->i_mount)) @@ -416,6 +418,20 @@ xfs_splice_write( if (ret > 0) XFS_STATS_ADD(xs_write_bytes, ret); + isize = i_size_read(inode); + if (unlikely(ret < 0 && ret != -EFAULT && *ppos > isize)) + *ppos = isize; + + if (*ppos > ip->i_d.di_size) { + xfs_ilock(ip, XFS_ILOCK_EXCL); + if (*ppos > ip->i_d.di_size) { + ip->i_d.di_size = *ppos; + i_size_write(inode, *ppos); + ip->i_update_core = 1; + ip->i_update_size = 1; + } + xfs_iunlock(ip, XFS_ILOCK_EXCL); + } xfs_iunlock(ip, XFS_IOLOCK_EXCL); return ret; } From owner-xfs@oss.sgi.com Fri Sep 1 06:42:52 2006 Received: with ECARTIS (v1.0.0; list xfs); Fri, 01 Sep 2006 06:43:12 -0700 (PDT) Received: from kernel.dk (brick.kernel.dk [62.242.22.158]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id k81DgqDW029130 for ; Fri, 1 Sep 2006 06:42:52 -0700 Received: from nelson.home.kernel.dk (nelson.home.kernel.dk [192.168.0.33]) by kernel.dk (Postfix) with ESMTP id A22EE63CE1; Fri, 1 Sep 2006 15:42:13 +0200 (CEST) Received: by nelson.home.kernel.dk (Postfix, from userid 1000) id AF1741192E; Fri, 1 Sep 2006 15:45:12 +0200 (CEST) Date: Fri, 1 Sep 2006 15:45:12 +0200 From: Jens Axboe To: David Chinner Cc: "Jeffrey E. Hundstad" , xfs@oss.sgi.com, nathans@sgi.com Subject: Re: vmsplice can't work well Message-ID: <20060901134512.GD25434@kernel.dk> References: <44F4440F.1090300@gmail.com> <20060829140542.GN12257@kernel.dk> <44F5CC08.8010205@mnsu.edu> <20060830174815.GF7331@kernel.dk> <44F5D3C6.1010108@mnsu.edu> <20060831092440.GC5528@kernel.dk> <20060901131913.GG5737019@melbourne.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20060901131913.GG5737019@melbourne.sgi.com> X-archive-position: 8863 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: axboe@kernel.dk Precedence: bulk X-list: xfs On Fri, Sep 01 2006, David Chinner wrote: > On Thu, Aug 31, 2006 at 11:24:41AM +0200, Jens Axboe wrote: > > XFS list, > > > > On Wed, Aug 30 2006, Jeffrey E. Hundstad wrote: > > > Jens Axboe wrote: > > > >On Wed, Aug 30 2006, Jeffrey E. Hundstad wrote: > > > > > > > >>I tried your splie-git...tar.gz file and tried the splice-cp. It > > > >>produced files that are the right length... but the files only contain > > > >>nulls. Here's the straces: > > > >> > > > > > > > >Works for me as well. Could be an fs issue, how large was the README and > > > >what filesystem did you use? > > > > > > > > > > > The file was 1130 bytes (it was the README in that directory.) The > > > filesystem is XFS. > > > > > > > I can reproduce this quite easily, doing: > > > > nelson:~ # splice-cp sda.blktrace.0 foo > > > > nelson:~ # md5sum sda.blktrace.0 foo > > 4754070ae77091468c830ea23b125d68 sda.blktrace.0 > > efdc7b9d00692fdfe91a691277209267 foo > > Busted write side - splice-in works fine, splice-out is an alias > for /dev/zero. The reason it's full of NULLs: > > death:/mnt# xfs_bmap -vv foo > foo: no extents > death:/mnt# > > It's a hole. Nothing has been flushed out to disk. > > Interesting - the inode is leaving pipe_to_file() dirty, the page is > dirty, the buffer head is dirty, delay, mapped and uptodate. The > page is the only page in the radix tree and the radix tree is marked > dirty. > > But it never gets flushed out. Even when I use dd to seek past the > first disk block and write further into the file, I still end up > with a hole in the range where the original splice write should > be which means it was no longer in the page cache. > > Copying a large file I can see dirty memory increase to tens of > megabytes. Nothing is going to disk, writeback is not going above > zero. Interestingly, when the write completes, the size of the page > cache drops by almost exactly the size of the file being written - > almost like a truncate_inode_pages() is occuring on file close. > > Oh, look - we _are_ tossing away all the pages on close. > > xfs_splice_write() hasn't updated the xfs inode size when extending the > file. The linux inode has the correct value, but xfs thinks that it's > only got a speculative allocation EOF (i.e. 0) so we invalidate it > before it gets to disk. > > The patch below just copies some code out of xfs_write() where it > updates the xfs inode size and drops it in xfs_splice_write(). It's > almost certainly not the right fix, but the bucket under the pipe will > now catch most of the bits.... Good analysis and fix, Dave! I don't have time to test it right now, perhaps Jeffrey can give it a shot? Will you make sure this gets into 2.6.18? -- Jens Axboe From owner-xfs@oss.sgi.com Fri Sep 1 09:13:01 2006 Received: with ECARTIS (v1.0.0; list xfs); Fri, 01 Sep 2006 09:13:10 -0700 (PDT) Received: from mail.itsolut.com (mail.itsolut.com [64.182.153.89]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id k81GD0DW021514 for ; Fri, 1 Sep 2006 09:13:00 -0700 Received: by mail.itsolut.com (Postfix, from userid 5004) id 7E87B43E35; Fri, 1 Sep 2006 11:12:27 -0500 (EST) Received: from [192.168.1.3] (adsl-68-251-149-159.dsl.bltnin.ameritech.net [68.251.149.159]) by mail.itsolut.com (Postfix) with ESMTP id 4E9D043223 for ; Fri, 1 Sep 2006 11:12:24 -0500 (EST) Message-ID: <44F85BE7.2010001@gmail.com> Date: Fri, 01 Sep 2006 12:12:23 -0400 From: Chris Hane User-Agent: Thunderbird 1.5.0.5 (Windows/20060719) MIME-Version: 1.0 To: xfs@oss.sgi.com Subject: Re: XFS and 3.2TB Partition References: <44F714F2.7050502@gmail.com> <200609010836.32331.krienke@uni-koblenz.de> In-Reply-To: <200609010836.32331.krienke@uni-koblenz.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 8866 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: chrishane@gmail.com Precedence: bulk X-list: xfs Content-Length: 3029 Lines: 79 Thank for the input. I appreciate everyones help! I believe I am going to end up not partitioning the raid array and using it directly (as described in an email which I copied below). When we get our next large storage machine in (we've are going to need a couple over the next year to store CD & DVD ISO Images), I'm going to experiment some more with the suggestions everyone has given me here. As an FYI: I'm using the latest versions of everything (parted 1.7.1, kernel 2.6.16, 3ware raid controller brand new) Thanks for the help, Chris.... Peter Grandi wrote: >>>> On Thu, 31 Aug 2006 12:57:22 -0400, Chris Hane >>>> said: > > chrishane> I am trying to create a 3.2TB partition on my Raid 5. > chrishane> Is there a document that could help? > > The 9500 is fairly recent, so it should not have a lot of 2TB > problems. But there are 2TB limits in several places. For example > old versions of the Linux kernel don't support more than 2TB per > _filesystem_. > > But I suspect that you are trying to create partitions in the > sense of the MS-DOS/MS-Windows partitioning scheme. Check > carefully whether that partitioning scheme allos partitions > larger than 2TB :-). > > Anyhow, usually for very large filesystems you don't need > partitions at all. Just use '/dev/sda'. Or check the other > partitioning schemes supported by Linux, some may have higher > limits. > > chrishane> I have a 3ware 9500 controller and 8 *500GB sata > chrishane> drives configured into a single RAID 5 array. > > Using RAID5 with 8 drives is a great crime. Nothing to do with > your partitioning problems, but since you mentioned it... > Consider reading carefully > Rainer Krienke wrote: > Am Donnerstag, 31. August 2006 18:57 schrieben Sie: >> I am trying to create a 3.2TB partition on my Raid 5. Is there a >> document that could help? >> >> I have a 3ware 9500 controller and 8 *500GB sata drives configured into >> a single RAID 5 array. >> > > I have a Raid with about 5TB and no problems creating an xfs filesystem on it. > The system is Novell SLES10 with a 2.6.16.21 kernel. > > At first there was a problem with the raid. The firmware of the raid device > needed an upgrade. Bevore the upgrade I had a maximum of 2TB. > > In dmesg (or /var/log/boot.msg on SLES10) you should see something like this > message if the device (sdc here) is handled correctly: > > <5>sdc : very big device. try to use READ CAPACITY(16). > <5>SCSI device sdc: 10156243968 512-byte hdwr sectors (5199997 MB) > <5>sdc: Write Protect is off > <7>sdc: Mode Sense: cb 00 00 08 > <5>SCSI device sdc: drive cache: write back > <5>sdc : very big device. try to use READ CAPACITY(16). > <5>SCSI device sdc: 10156243968 512-byte hdwr sectors (5199997 MB) > > Bevore the firmware update there was an error when trying to read the capacity > via READ CAPACITY(16). > > I created the partitiions using parted. fdisk did not work. > > Have a nice day > Rainer From owner-xfs@oss.sgi.com Fri Sep 1 20:33:13 2006 Received: with ECARTIS (v1.0.0; list xfs); Fri, 01 Sep 2006 20:33:29 -0700 (PDT) Received: from avalanche.hickorytech.net (smtp.hickorytech.net [216.114.192.16]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id k823XCDW012434 for ; Fri, 1 Sep 2006 20:33:13 -0700 Received: from localhost (localhost.localdomain [127.0.0.1]) by avalanche.hickorytech.net (Postfix) with ESMTP id EF132204FC9; Fri, 1 Sep 2006 21:25:14 -0500 (CDT) Received: from avalanche.hickorytech.net ([216.114.192.16]) by localhost (avalanche.hickorytech.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NsKYhpW9HJC2; Fri, 1 Sep 2006 21:25:14 -0500 (CDT) Received: from [10.0.0.1] (mn-10k-dhcp2-220.dsl.hickorytech.net [216.114.240.220]) by avalanche.hickorytech.net (Postfix) with ESMTP id B8F1E204FC1; Fri, 1 Sep 2006 21:25:14 -0500 (CDT) Message-ID: <44F8ECE7.2090102@mnsu.edu> Date: Fri, 01 Sep 2006 21:31:03 -0500 From: "Jeffrey E. Hundstad" User-Agent: Thunderbird 1.5.0.5 (X11/20060812) MIME-Version: 1.0 To: David Chinner Cc: Jens Axboe , xfs@oss.sgi.com, nathans@sgi.com Subject: Re: vmsplice can't work well References: <44F4440F.1090300@gmail.com> <20060829140542.GN12257@kernel.dk> <44F5CC08.8010205@mnsu.edu> <20060830174815.GF7331@kernel.dk> <44F5D3C6.1010108@mnsu.edu> <20060831092440.GC5528@kernel.dk> <20060901131913.GG5737019@melbourne.sgi.com> In-Reply-To: <20060901131913.GG5737019@melbourne.sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 8868 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: jeffrey.hundstad@mnsu.edu Precedence: bulk X-list: xfs Content-Length: 2707 Lines: 84 David Chinner wrote: > On Thu, Aug 31, 2006 at 11:24:41AM +0200, Jens Axboe wrote: > >> XFS list, >> >> On Wed, Aug 30 2006, Jeffrey E. Hundstad wrote: >> >>> Jens Axboe wrote: >>> >>>> On Wed, Aug 30 2006, Jeffrey E. Hundstad wrote: >>>> >>>> >>>>> I tried your splie-git...tar.gz file and tried the splice-cp. It >>>>> produced files that are the right length... but the files only contain >>>>> nulls. Here's the straces: >>>>> >>>>> >>>> Works for me as well. Could be an fs issue, how large was the README and >>>> what filesystem did you use? >>>> >>>> >>>> >>> The file was 1130 bytes (it was the README in that directory.) The >>> filesystem is XFS. >>> >>> >> I can reproduce this quite easily, doing: >> >> nelson:~ # splice-cp sda.blktrace.0 foo >> >> nelson:~ # md5sum sda.blktrace.0 foo >> 4754070ae77091468c830ea23b125d68 sda.blktrace.0 >> efdc7b9d00692fdfe91a691277209267 foo >> > > Busted write side - splice-in works fine, splice-out is an alias > for /dev/zero. The reason it's full of NULLs: > > death:/mnt# xfs_bmap -vv foo > foo: no extents > death:/mnt# > > It's a hole. Nothing has been flushed out to disk. > > Interesting - the inode is leaving pipe_to_file() dirty, the page is > dirty, the buffer head is dirty, delay, mapped and uptodate. The > page is the only page in the radix tree and the radix tree is marked > dirty. > > But it never gets flushed out. Even when I use dd to seek past the > first disk block and write further into the file, I still end up > with a hole in the range where the original splice write should > be which means it was no longer in the page cache. > > Copying a large file I can see dirty memory increase to tens of > megabytes. Nothing is going to disk, writeback is not going above > zero. Interestingly, when the write completes, the size of the page > cache drops by almost exactly the size of the file being written - > almost like a truncate_inode_pages() is occuring on file close. > > Oh, look - we _are_ tossing away all the pages on close. > > xfs_splice_write() hasn't updated the xfs inode size when extending the > file. The linux inode has the correct value, but xfs thinks that it's > only got a speculative allocation EOF (i.e. 0) so we invalidate it > before it gets to disk. > > The patch below just copies some code out of xfs_write() where it updates > the xfs inode size and drops it in xfs_splice_write(). It's almost certainly not > the right fix, but the bucket under the pipe will now catch most of the > bits.... > > Cheers, > > Dave. > I can confirm that this patch allows splice-cp to work as expected! Thanks all! -- Jeffrey Hundstad From owner-xfs@oss.sgi.com Sun Sep 3 17:18:16 2006 Received: with ECARTIS (v1.0.0; list xfs); Sun, 03 Sep 2006 17:18:36 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id k840I3DW002413 for ; Sun, 3 Sep 2006 17:18:15 -0700 Received: from wobbly.melbourne.sgi.com (wobbly.melbourne.sgi.com [134.14.55.135]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA17481; Mon, 4 Sep 2006 10:17:15 +1000 Received: from wobbly.melbourne.sgi.com (localhost [127.0.0.1]) by wobbly.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id k840HDgw3331177; Mon, 4 Sep 2006 10:17:13 +1000 (EST) Received: (from nathans@localhost) by wobbly.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id k840HB413272533; Mon, 4 Sep 2006 10:17:11 +1000 (EST) Date: Mon, 4 Sep 2006 10:17:11 +1000 From: Nathan Scott To: lachlan@sgi.com Cc: xfs@oss.sgi.com Subject: review: minor cleanup in xfs_read locking Message-ID: <20060904101711.A3331169@wobbly.melbourne.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i X-archive-position: 8871 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: nathans@sgi.com Precedence: bulk X-list: xfs Content-Length: 985 Lines: 35 Hi Lachlan, Could you check this for me - it just folds the second direct I/O conditional added in your recent deadlock fix back into the prior branch, which is also direct I/O specific... thanks. -- Nathan Index: xfs-linux/linux-2.6/xfs_lrw.c =================================================================== --- xfs-linux.orig/linux-2.6/xfs_lrw.c 2006-09-04 09:59:10.955973000 +1000 +++ xfs-linux/linux-2.6/xfs_lrw.c 2006-09-04 09:59:42.205926000 +1000 @@ -270,12 +270,12 @@ xfs_read( } } - if (unlikely((ioflags & IO_ISDIRECT) && VN_CACHED(vp))) - bhv_vop_flushinval_pages(vp, ctooff(offtoct(*offset)), - -1, FI_REMAPF_LOCKED); - - if (unlikely(ioflags & IO_ISDIRECT)) + if (unlikely((ioflags & IO_ISDIRECT))) { + if (VN_CACHED(vp)) + bhv_vop_flushinval_pages(vp, ctooff(offtoct(*offset)), + -1, FI_REMAPF_LOCKED); mutex_unlock(&inode->i_mutex); + } xfs_rw_enter_trace(XFS_READ_ENTER, &ip->i_iocore, (void *)iovp, segs, *offset, ioflags); From owner-xfs@oss.sgi.com Sun Sep 3 18:10:49 2006 Received: with ECARTIS (v1.0.0; list xfs); Sun, 03 Sep 2006 18:11:08 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id k841AaDW008320 for ; Sun, 3 Sep 2006 18:10:48 -0700 Received: from wobbly.melbourne.sgi.com (wobbly.melbourne.sgi.com [134.14.55.135]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA18407; Mon, 4 Sep 2006 11:09:49 +1000 Received: from wobbly.melbourne.sgi.com (localhost [127.0.0.1]) by wobbly.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id k8419lgw3328556; Mon, 4 Sep 2006 11:09:48 +1000 (EST) Received: (from nathans@localhost) by wobbly.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id k8419jrC3301042; Mon, 4 Sep 2006 11:09:45 +1000 (EST) Date: Mon, 4 Sep 2006 11:09:45 +1000 From: Nathan Scott To: Lachlan McIlroy Cc: xfs@oss.sgi.com Subject: Re: review: minor cleanup in xfs_read locking Message-ID: <20060904110945.A3329063@wobbly.melbourne.sgi.com> References: <20060904101711.A3331169@wobbly.melbourne.sgi.com> <44FB75CB.8050809@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <44FB75CB.8050809@sgi.com>; from lachlan@sgi.com on Mon, Sep 04, 2006 at 01:39:39AM +0100 X-archive-position: 8872 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: nathans@sgi.com Precedence: bulk X-list: xfs Content-Length: 273 Lines: 12 On Mon, Sep 04, 2006 at 01:39:39AM +0100, Lachlan McIlroy wrote: > Looking a little closer... you could probably do away with the extra > pair of parentheses in the call to unlikely(). > Done, thanks - I'll push in most of my pending stuff shortly. cheers. -- Nathan From owner-xfs@oss.sgi.com Sun Sep 3 18:30:45 2006 Received: with ECARTIS (v1.0.0; list xfs); Sun, 03 Sep 2006 18:30:54 -0700 (PDT) Received: from omx1.americas.sgi.com (omx1.americas.sgi.com [198.149.16.13]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id k841UYDW010939 for ; Sun, 3 Sep 2006 18:30:45 -0700 Received: from internal-mail-relay1.corp.sgi.com (internal-mail-relay1.corp.sgi.com [198.149.32.52]) by omx1.americas.sgi.com (8.12.10/8.12.9/linux-outbound_gateway-1.1) with ESMTP id k840UTnx017092 for ; Sun, 3 Sep 2006 19:30:29 -0500 Received: from [134.15.160.1] (vpn-emea-sw-emea-160-1.emea.sgi.com [134.15.160.1]) by internal-mail-relay1.corp.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id k840U78s36683850; Sun, 3 Sep 2006 17:30:08 -0700 (PDT) Message-ID: <44FB73FA.6010400@sgi.com> Date: Mon, 04 Sep 2006 01:31:54 +0100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com Organization: SGI User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.7.12) Gecko/20050920 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Nathan Scott CC: xfs@oss.sgi.com Subject: Re: review: minor cleanup in xfs_read locking References: <20060904101711.A3331169@wobbly.melbourne.sgi.com> In-Reply-To: <20060904101711.A3331169@wobbly.melbourne.sgi.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 8873 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs Content-Length: 436 Lines: 16 Looks good Nathan. I've made changes to check return codes from bhv_vop_flushinval_pages() and friends so it's now dependent on this change. I'll post a review as soon as your change has gone in. Nathan Scott wrote: > Hi Lachlan, > > Could you check this for me - it just folds the second direct I/O > conditional added in your recent deadlock fix back into the prior > branch, which is also direct I/O specific... > > thanks. > From owner-xfs@oss.sgi.com Sun Sep 3 18:32:59 2006 Received: with ECARTIS (v1.0.0; list xfs); Sun, 03 Sep 2006 18:33:15 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id k841WkDW011352 for ; Sun, 3 Sep 2006 18:32:57 -0700 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA18914; Mon, 4 Sep 2006 11:31:57 +1000 Received: by chook.melbourne.sgi.com (Postfix, from userid 16302) id 3ED5158CF851; Mon, 4 Sep 2006 11:31:56 +1000 (EST) To: linux-xfs@oss.sgi.com, sgi.bugs.xfs@engr.sgi.com Subject: TAKE 955302 - fix warnings Message-Id: <20060904013157.3ED5158CF851@chook.melbourne.sgi.com> Date: Mon, 4 Sep 2006 11:31:56 +1000 (EST) From: nathans@sgi.com (Nathan Scott) X-archive-position: 8875 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: nathans@sgi.com Precedence: bulk X-list: xfs Content-Length: 1173 Lines: 24 Fix kmem_zalloc_greedy warnings on 64 bit platforms. Date: Mon Sep 4 11:31:03 AEST 2006 Workarea: chook.melbourne.sgi.com:/build/nathans/xfs-linux Inspected by: lachlan,vapo The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/xfs-kern/xfs-linux-melb Modid: xfs-linux-melb:xfs-kern:26907a xfs_itable.c - 1.148 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_itable.c.diff?r1=text&tr1=1.148&r2=text&tr2=1.147&f=h xfs_vfsops.c - 1.511 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_vfsops.c.diff?r1=text&tr1=1.511&r2=text&tr2=1.510&f=h xfs_mount.h - 1.227 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_mount.h.diff?r1=text&tr1=1.227&r2=text&tr2=1.226&f=h quota/xfs_qm.c - 1.44 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/quota/xfs_qm.c.diff?r1=text&tr1=1.44&r2=text&tr2=1.43&f=h linux-2.6/xfs_ksyms.c - 1.51 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_ksyms.c.diff?r1=text&tr1=1.51&r2=text&tr2=1.50&f=h linux-2.4/xfs_ksyms.c - 1.46 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.4/xfs_ksyms.c.diff?r1=text&tr1=1.46&r2=text&tr2=1.45&f=h From owner-xfs@oss.sgi.com Sun Sep 3 18:31:32 2006 Received: with ECARTIS (v1.0.0; list xfs); Sun, 03 Sep 2006 18:31:36 -0700 (PDT) Received: from omx1.americas.sgi.com (omx1.americas.sgi.com [198.149.16.13]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id k841VLDW011078 for ; Sun, 3 Sep 2006 18:31:31 -0700 Received: from internal-mail-relay1.corp.sgi.com (internal-mail-relay1.corp.sgi.com [198.149.32.52]) by omx1.americas.sgi.com (8.12.10/8.12.9/linux-outbound_gateway-1.1) with ESMTP id k840cDnx017901 for ; Sun, 3 Sep 2006 19:38:13 -0500 Received: from [134.15.160.1] (vpn-emea-sw-emea-160-1.emea.sgi.com [134.15.160.1]) by internal-mail-relay1.corp.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id k840bq8s36682405; Sun, 3 Sep 2006 17:37:52 -0700 (PDT) Message-ID: <44FB75CB.8050809@sgi.com> Date: Mon, 04 Sep 2006 01:39:39 +0100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com Organization: SGI User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.7.12) Gecko/20050920 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Nathan Scott CC: xfs@oss.sgi.com Subject: Re: review: minor cleanup in xfs_read locking References: <20060904101711.A3331169@wobbly.melbourne.sgi.com> In-Reply-To: <20060904101711.A3331169@wobbly.melbourne.sgi.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 8874 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs Content-Length: 354 Lines: 13 Looking a little closer... you could probably do away with the extra pair of parentheses in the call to unlikely(). Nathan Scott wrote: > Hi Lachlan, > > Could you check this for me - it just folds the second direct I/O > conditional added in your recent deadlock fix back into the prior > branch, which is also direct I/O specific... > > thanks. > From owner-xfs@oss.sgi.com Sun Sep 3 18:37:37 2006 Received: with ECARTIS (v1.0.0; list xfs); Sun, 03 Sep 2006 18:37:44 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id k841bODW012301 for ; Sun, 3 Sep 2006 18:37:36 -0700 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA19011 for ; Mon, 4 Sep 2006 11:36:41 +1000 Received: by chook.melbourne.sgi.com (Postfix, from userid 16302) id AB70258CF851; Mon, 4 Sep 2006 11:36:40 +1000 (EST) To: linux-xfs@oss.sgi.com Subject: TAKE 955696 - cleanup, xfs_read Message-Id: <20060904013640.AB70258CF851@chook.melbourne.sgi.com> Date: Mon, 4 Sep 2006 11:36:40 +1000 (EST) From: nathans@sgi.com (Nathan Scott) X-archive-position: 8876 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: nathans@sgi.com Precedence: bulk X-list: xfs Content-Length: 480 Lines: 14 Minor cleanup from dio locking fix, remove an extra conditional. Date: Mon Sep 4 11:36:19 AEST 2006 Workarea: chook.melbourne.sgi.com:/build/nathans/xfs-linux Inspected by: lachlan The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/xfs-kern/xfs-linux-melb Modid: xfs-linux-melb:xfs-kern:26908a linux-2.6/xfs_lrw.c - 1.250 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_lrw.c.diff?r1=text&tr1=1.250&r2=text&tr2=1.249&f=h From owner-xfs@oss.sgi.com Mon Sep 4 04:24:08 2006 Received: with ECARTIS (v1.0.0; list xfs); Mon, 04 Sep 2006 04:24:19 -0700 (PDT) Received: from imr2.americas.sgi.com (imr2.americas.sgi.com [198.149.16.18]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id k84BNvDW004644 for ; Mon, 4 Sep 2006 04:24:08 -0700 Received: from [134.15.160.13] (vpn-emea-sw-emea-160-13.emea.sgi.com [134.15.160.13]) by imr2.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id k84BNqDu51075438 for ; Mon, 4 Sep 2006 04:23:53 -0700 (PDT) Message-ID: <44FC0D0F.60403@sgi.com> Date: Mon, 04 Sep 2006 12:25:03 +0100 From: Lachlan McIlroy Reply-To: lachlan@sgi.com Organization: SGI User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.7.12) Gecko/20050920 X-Accept-Language: en-us, en MIME-Version: 1.0 To: xfs@oss.sgi.com Subject: review: propogate return codes from flush routines Content-Type: multipart/mixed; boundary="------------070105020404090905000704" X-archive-position: 8880 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: lachlan@sgi.com Precedence: bulk X-list: xfs Content-Length: 10601 Lines: 319 This is a multi-part message in MIME format. --------------070105020404090905000704 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Here's a patch to handle error return values in fs_flush_pages and fs_flushinval_pages. It changes the prototype of fs_flushinval_pages so we can propogate the errors and handle them at higher layers. I also modified xfs_itruncate_start so that it could propogate the error further. I've changed the necessary prototypes on 2.4 to keep the build happy but haven't bothered to fix the error handling in fs_flush_pages or fs_flushinval_pages for 2.4. The motivation behind this change was the recent BUG reported due to a direct I/O read trying to write to delayed alloc extents. While the exact cause of this problem is not known it is possible that fs_flushinval_pages ignored an error while flushing, truncated the pages on the file anyway, and failed to convert all delayed alloc extents. Lachlan --------------070105020404090905000704 Content-Type: text/plain; name="flush.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="flush.patch" --- fs/xfs/linux-2.4/xfs_fs_subr.c_1.48 2006-09-04 11:55:28.000000000 +0100 +++ fs/xfs/linux-2.4/xfs_fs_subr.c 2006-09-04 11:54:38.000000000 +0100 @@ -35,7 +35,7 @@ truncate_inode_pages(ip->i_mapping, first); } -void +int fs_flushinval_pages( bhv_desc_t *bdp, xfs_off_t first, @@ -53,6 +53,7 @@ filemap_fdatawait(ip->i_mapping); truncate_inode_pages(ip->i_mapping, first); } + return 0; } int --- fs/xfs/linux-2.4/xfs_fs_subr.h_1.17 2006-09-04 11:55:52.000000000 +0100 +++ fs/xfs/linux-2.4/xfs_fs_subr.h 2006-09-04 11:55:24.000000000 +0100 @@ -23,7 +23,7 @@ extern int fs_nosys(void); extern void fs_noval(void); extern void fs_tosspages(bhv_desc_t *, xfs_off_t, xfs_off_t, int); -extern void fs_flushinval_pages(bhv_desc_t *, xfs_off_t, xfs_off_t, int); +extern int fs_flushinval_pages(bhv_desc_t *, xfs_off_t, xfs_off_t, int); extern int fs_flush_pages(bhv_desc_t *, xfs_off_t, xfs_off_t, uint64_t, int); #endif /* __XFS_FS_SUBR_H__ */ --- fs/xfs/linux-2.4/xfs_vnode.h_1.113 2006-09-04 11:56:17.000000000 +0100 +++ fs/xfs/linux-2.4/xfs_vnode.h 2006-09-04 11:56:51.000000000 +0100 @@ -183,7 +183,7 @@ typedef void (*vop_link_removed_t)(bhv_desc_t *, bhv_vnode_t *, int); typedef void (*vop_vnode_change_t)(bhv_desc_t *, bhv_vchange_t, __psint_t); typedef void (*vop_ptossvp_t)(bhv_desc_t *, xfs_off_t, xfs_off_t, int); -typedef void (*vop_pflushinvalvp_t)(bhv_desc_t *, xfs_off_t, xfs_off_t, int); +typedef int (*vop_pflushinvalvp_t)(bhv_desc_t *, xfs_off_t, xfs_off_t, int); typedef int (*vop_pflushvp_t)(bhv_desc_t *, xfs_off_t, xfs_off_t, uint64_t, int); typedef int (*vop_iflush_t)(bhv_desc_t *, int); --- fs/xfs/linux-2.6/xfs_fs_subr.c_1.47 2006-09-01 16:34:01.000000000 +0100 +++ fs/xfs/linux-2.6/xfs_fs_subr.c 2006-09-01 16:36:00.000000000 +0100 @@ -35,7 +35,7 @@ truncate_inode_pages(ip->i_mapping, first); } -void +int fs_flushinval_pages( bhv_desc_t *bdp, xfs_off_t first, @@ -44,13 +44,16 @@ { bhv_vnode_t *vp = BHV_TO_VNODE(bdp); struct inode *ip = vn_to_inode(vp); + int ret = 0; if (VN_CACHED(vp)) { if (VN_TRUNC(vp)) VUNTRUNCATE(vp); - filemap_write_and_wait(ip->i_mapping); - truncate_inode_pages(ip->i_mapping, first); + ret = filemap_write_and_wait(ip->i_mapping); + if (!ret) + truncate_inode_pages(ip->i_mapping, first); } + return ret; } int @@ -63,14 +66,14 @@ { bhv_vnode_t *vp = BHV_TO_VNODE(bdp); struct inode *ip = vn_to_inode(vp); + int ret = 0; if (VN_DIRTY(vp)) { if (VN_TRUNC(vp)) VUNTRUNCATE(vp); - filemap_fdatawrite(ip->i_mapping); - if (flags & XFS_B_ASYNC) - return 0; - filemap_fdatawait(ip->i_mapping); + ret = filemap_fdatawrite(ip->i_mapping); + if (!ret && !(flags & XFS_B_ASYNC)) + ret = filemap_fdatawait(ip->i_mapping); } - return 0; + return ret; } --- fs/xfs/linux-2.6/xfs_fs_subr.h_1.13 2006-09-01 18:24:35.000000000 +0100 +++ fs/xfs/linux-2.6/xfs_fs_subr.h 2006-09-01 17:08:16.000000000 +0100 @@ -23,7 +23,7 @@ extern int fs_nosys(void); extern void fs_noval(void); extern void fs_tosspages(bhv_desc_t *, xfs_off_t, xfs_off_t, int); -extern void fs_flushinval_pages(bhv_desc_t *, xfs_off_t, xfs_off_t, int); +extern int fs_flushinval_pages(bhv_desc_t *, xfs_off_t, xfs_off_t, int); extern int fs_flush_pages(bhv_desc_t *, xfs_off_t, xfs_off_t, uint64_t, int); #endif /* __XFS_FS_SUBR_H__ */ --- fs/xfs/linux-2.6/xfs_lrw.c_1.250 2006-09-04 11:03:51.000000000 +0100 +++ fs/xfs/linux-2.6/xfs_lrw.c 2006-09-04 11:05:20.000000000 +0100 @@ -200,7 +200,7 @@ struct file *file = iocb->ki_filp; struct inode *inode = file->f_mapping->host; size_t size = 0; - ssize_t ret; + ssize_t ret = 0; xfs_fsize_t n; xfs_inode_t *ip; xfs_mount_t *mp; @@ -272,9 +272,13 @@ if (unlikely(ioflags & IO_ISDIRECT)) { if (VN_CACHED(vp)) - bhv_vop_flushinval_pages(vp, ctooff(offtoct(*offset)), + ret = bhv_vop_flushinval_pages(vp, ctooff(offtoct(*offset)), -1, FI_REMAPF_LOCKED); mutex_unlock(&inode->i_mutex); + if (ret) { + xfs_iunlock(ip, XFS_IOLOCK_SHARED); + return ret; + } } xfs_rw_enter_trace(XFS_READ_ENTER, &ip->i_iocore, @@ -802,8 +806,10 @@ if (need_flush) { xfs_inval_cached_trace(io, pos, -1, ctooff(offtoct(pos)), -1); - bhv_vop_flushinval_pages(vp, ctooff(offtoct(pos)), + error = bhv_vop_flushinval_pages(vp, ctooff(offtoct(pos)), -1, FI_REMAPF_LOCKED); + if (error) + goto out_unlock_internal; } if (need_i_mutex) { --- fs/xfs/linux-2.6/xfs_vnode.h_1.125 2006-09-01 18:11:19.000000000 +0100 +++ fs/xfs/linux-2.6/xfs_vnode.h 2006-09-01 18:12:32.000000000 +0100 @@ -196,7 +196,7 @@ typedef void (*vop_link_removed_t)(bhv_desc_t *, bhv_vnode_t *, int); typedef void (*vop_vnode_change_t)(bhv_desc_t *, bhv_vchange_t, __psint_t); typedef void (*vop_ptossvp_t)(bhv_desc_t *, xfs_off_t, xfs_off_t, int); -typedef void (*vop_pflushinvalvp_t)(bhv_desc_t *, xfs_off_t, xfs_off_t, int); +typedef int (*vop_pflushinvalvp_t)(bhv_desc_t *, xfs_off_t, xfs_off_t, int); typedef int (*vop_pflushvp_t)(bhv_desc_t *, xfs_off_t, xfs_off_t, uint64_t, int); typedef int (*vop_iflush_t)(bhv_desc_t *, int); --- fs/xfs/xfs_dfrag.c_1.55 2006-09-01 18:25:24.000000000 +0100 +++ fs/xfs/xfs_dfrag.c 2006-09-01 16:46:00.000000000 +0100 @@ -200,7 +200,9 @@ if (VN_CACHED(tvp) != 0) { xfs_inval_cached_trace(&tip->i_iocore, 0, -1, 0, -1); - bhv_vop_flushinval_pages(tvp, 0, -1, FI_REMAPF_LOCKED); + error = bhv_vop_flushinval_pages(tvp, 0, -1, FI_REMAPF_LOCKED); + if (error) + goto error0; } /* Verify O_DIRECT for ftmp */ --- fs/xfs/xfs_inode.c_1.451 2006-09-01 18:25:49.000000000 +0100 +++ fs/xfs/xfs_inode.c 2006-09-01 16:52:40.000000000 +0100 @@ -1421,7 +1421,7 @@ * must be called again with all the same restrictions as the initial * call. */ -void +int xfs_itruncate_start( xfs_inode_t *ip, uint flags, @@ -1431,6 +1431,7 @@ xfs_off_t toss_start; xfs_mount_t *mp; bhv_vnode_t *vp; + int error = 0; ASSERT(ismrlocked(&ip->i_iolock, MR_UPDATE) != 0); ASSERT((new_size == 0) || (new_size <= ip->i_d.di_size)); @@ -1468,7 +1469,7 @@ * file size, so there is no way that the data extended * out there. */ - return; + return 0; } last_byte = xfs_file_last_byte(ip); xfs_itrunc_trace(XFS_ITRUNC_START, ip, flags, new_size, toss_start, @@ -1477,7 +1478,7 @@ if (flags & XFS_ITRUNC_DEFINITE) { bhv_vop_toss_pages(vp, toss_start, -1, FI_REMAPF_LOCKED); } else { - bhv_vop_flushinval_pages(vp, toss_start, -1, FI_REMAPF_LOCKED); + error = bhv_vop_flushinval_pages(vp, toss_start, -1, FI_REMAPF_LOCKED); } } @@ -1486,6 +1487,7 @@ ASSERT(VN_CACHED(vp) == 0); } #endif + return error; } /* --- fs/xfs/xfs_inode.h_1.215 2006-09-01 18:26:15.000000000 +0100 +++ fs/xfs/xfs_inode.h 2006-09-01 16:53:11.000000000 +0100 @@ -439,7 +439,7 @@ uint xfs_dic2xflags(struct xfs_dinode_core *); int xfs_ifree(struct xfs_trans *, xfs_inode_t *, struct xfs_bmap_free *); -void xfs_itruncate_start(xfs_inode_t *, uint, xfs_fsize_t); +int xfs_itruncate_start(xfs_inode_t *, uint, xfs_fsize_t); int xfs_itruncate_finish(struct xfs_trans **, xfs_inode_t *, xfs_fsize_t, int, int); int xfs_iunlink(struct xfs_trans *, xfs_inode_t *); --- fs/xfs/xfs_utils.c_1.72 2006-09-01 18:26:39.000000000 +0100 +++ fs/xfs/xfs_utils.c 2006-09-01 16:55:21.000000000 +0100 @@ -420,7 +420,11 @@ * in a transaction. */ xfs_ilock(ip, XFS_IOLOCK_EXCL); - xfs_itruncate_start(ip, XFS_ITRUNC_DEFINITE, (xfs_fsize_t)0); + error = xfs_itruncate_start(ip, XFS_ITRUNC_DEFINITE, (xfs_fsize_t)0); + if (error) { + xfs_iunlock(ip, XFS_IOLOCK_EXCL); + return error; + } tp = xfs_trans_alloc(mp, XFS_TRANS_TRUNCATE_FILE); if ((error = xfs_trans_reserve(tp, 0, XFS_ITRUNCATE_LOG_RES(mp), 0, --- fs/xfs/xfs_vfsops.c_1.511 2006-09-04 11:06:00.000000000 +0100 +++ fs/xfs/xfs_vfsops.c 2006-09-04 11:00:50.000000000 +0100 @@ -1150,7 +1150,7 @@ if (XFS_FORCED_SHUTDOWN(mp)) { bhv_vop_toss_pages(vp, 0, -1, FI_REMAPF); } else { - bhv_vop_flushinval_pages(vp, 0, -1, FI_REMAPF); + error = bhv_vop_flushinval_pages(vp, 0, -1, FI_REMAPF); } xfs_ilock(ip, XFS_ILOCK_SHARED); --- fs/xfs/xfs_vnodeops.c_1.682 2006-09-01 18:27:04.000000000 +0100 +++ fs/xfs/xfs_vnodeops.c 2006-09-04 01:11:37.000000000 +0100 @@ -1258,8 +1258,12 @@ * do that within a transaction. */ xfs_ilock(ip, XFS_IOLOCK_EXCL); - xfs_itruncate_start(ip, XFS_ITRUNC_DEFINITE, + error = xfs_itruncate_start(ip, XFS_ITRUNC_DEFINITE, ip->i_d.di_size); + if (error) { + xfs_iunlock(ip, XFS_IOLOCK_EXCL); + return error; + } error = xfs_trans_reserve(tp, 0, XFS_ITRUNCATE_LOG_RES(mp), @@ -1676,7 +1680,11 @@ */ xfs_ilock(ip, XFS_IOLOCK_EXCL); - xfs_itruncate_start(ip, XFS_ITRUNC_DEFINITE, 0); + error = xfs_itruncate_start(ip, XFS_ITRUNC_DEFINITE, 0); + if (error) { + xfs_iunlock(ip, XFS_IOLOCK_EXCL); + return VN_INACTIVE_CACHE; + } error = xfs_trans_reserve(tp, 0, XFS_ITRUNCATE_LOG_RES(mp), @@ -4332,8 +4340,10 @@ if (VN_CACHED(vp) != 0) { xfs_inval_cached_trace(&ip->i_iocore, ioffset, -1, ctooff(offtoct(ioffset)), -1); - bhv_vop_flushinval_pages(vp, ctooff(offtoct(ioffset)), + error = bhv_vop_flushinval_pages(vp, ctooff(offtoct(ioffset)), -1, FI_REMAPF_LOCKED); + if (error) + goto out_unlock_iolock; } /* --------------070105020404090905000704-- From owner-xfs@oss.sgi.com Mon Sep 4 19:00:52 2006 Received: with ECARTIS (v1.0.0; list xfs); Mon, 04 Sep 2006 19:01:15 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id k8520dDW005141 for ; Mon, 4 Sep 2006 19:00:50 -0700 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA19420; Tue, 5 Sep 2006 11:59:49 +1000 Received: by chook.melbourne.sgi.com (Postfix, from userid 16346) id 334BF58CF851; Tue, 5 Sep 2006 11:59:49 +1000 (EST) To: linux-xfs@oss.sgi.com, sgi.bugs.xfs@engr.sgi.com Subject: TAKE 955939 - writing by splice() doesn't work in 2.6.17+ Message-Id: <20060905015949.334BF58CF851@chook.melbourne.sgi.com> Date: Tue, 5 Sep 2006 11:59:49 +1000 (EST) From: dgc@sgi.com (David Chinner) X-archive-position: 8882 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Content-Length: 811 Lines: 22 Fix xfs_splice_write() so appended data gets to disk. xfs_splice_write() failed to update the on disk inode size when extending the so when the file was closed the range extended by splice was truncated off. Hence any region of a file written to by splice would end up as a hole full of zeros. Date: Tue Sep 5 11:58:45 AEST 2006 Workarea: chook.melbourne.sgi.com:/build/dgc/isms/2.6.x-xfs Inspected by: lachlan The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb Modid: xfs-linux-melb:xfs-kern:26920a fs/xfs/linux-2.6/xfs_lrw.c - 1.251 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_lrw.c.diff?r1=text&tr1=1.251&r2=text&tr2=1.250&f=h - Update xfs inode size if xfs_splice_write is writing beyond the end of the current file. From owner-xfs@oss.sgi.com Tue Sep 5 00:54:51 2006 Received: with ECARTIS (v1.0.0; list xfs); Tue, 05 Sep 2006 00:55:13 -0700 (PDT) Received: from smtp3.adl2.internode.on.net (smtp3.adl2.internode.on.net [203.16.214.203]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id k857snDW006978 for ; Tue, 5 Sep 2006 00:54:50 -0700 Received: from saturn.flamingspork.com (ppp163-199.static.internode.on.net [150.101.163.199]) by smtp3.adl2.internode.on.net (8.13.6/8.13.5) with ESMTP id k857s9T9049644; Tue, 5 Sep 2006 17:24:10 +0930 (CST) (envelope-from stewart@flamingspork.com) Received: from localhost.localdomain (saturn.flamingspork.com [127.0.0.1]) by saturn.flamingspork.com (Postfix) with ESMTP id CFAAFC4055A; Tue, 5 Sep 2006 17:54:09 +1000 (EST) Received: by localhost.localdomain (Postfix, from userid 1000) id ACE8E147A386; Tue, 5 Sep 2006 17:54:09 +1000 (EST) Subject: Re: review: propogate return codes from flush routines From: Stewart Smith To: lachlan@sgi.com Cc: xfs@oss.sgi.com In-Reply-To: <44FC0D0F.60403@sgi.com> References: <44FC0D0F.60403@sgi.com> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-4A1jMXZoNsX4cSbdR8hr" Date: Tue, 05 Sep 2006 17:54:08 +1000 Message-Id: <1157442848.5844.38.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.6.1 X-archive-position: 8884 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: stewart@flamingspork.com Precedence: bulk X-list: xfs Content-Length: 1630 Lines: 45 --=-4A1jMXZoNsX4cSbdR8hr Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Mon, 2006-09-04 at 12:25 +0100, Lachlan McIlroy wrote: > Here's a patch to handle error return values in fs_flush_pages and > fs_flushinval_pages. It changes the prototype of fs_flushinval_pages > so we can propogate the errors and handle them at higher layers. I also= =20 > modified xfs_itruncate_start so that it could propogate the error further. IMHO this is always a good idea. Although I guess the only concern can be getting the right error back (and a useful one).=20 > The motivation behind this change was the recent BUG reported due to a > direct I/O read trying to write to delayed alloc extents. While the exact > cause of this problem is not known it is possible that fs_flushinval_pages > ignored an error while flushing, truncated the pages on the file anyway, > and failed to convert all delayed alloc extents. from a quick look the patch seems to do as advertised. i probably just haven't looked hard enough - but I'm assuming the layers higher up deal with the error and: report to user, write log message or something if there's a really catastrophic error? --=20 Stewart Smith (stewart@flamingspork.com) http://www.flamingspork.com/ --=-4A1jMXZoNsX4cSbdR8hr Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) iD8DBQBE/S0gKglWCUL+FDoRAmYOAKDWkSrawcugWkypcl3U+uhCnm9YtACeIlKY VxE8pVDXIERtN4mRxSKH/1o= =AvC+ -----END PGP SIGNATURE----- --=-4A1jMXZoNsX4cSbdR8hr-- From owner-xfs@oss.sgi.com Tue Sep 5 15:31:37 2006 Received: with ECARTIS (v1.0.0; list xfs); Tue, 05 Sep 2006 15:31:57 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id k85MVODW021811 for ; Tue, 5 Sep 2006 15:31:35 -0700 Received: from wobbly.melbourne.sgi.com (wobbly.melbourne.sgi.com [134.14.55.135]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id IAA16176; Wed, 6 Sep 2006 08:30:33 +1000 Received: from wobbly.melbourne.sgi.com (localhost [127.0.0.1]) by wobbly.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id k85MUVgw3383132; Wed, 6 Sep 2006 08:30:31 +1000 (EST) Received: (from nathans@localhost) by wobbly.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id k85MUSfP3385425; Wed, 6 Sep 2006 08:30:28 +1000 (EST) Date: Wed, 6 Sep 2006 08:30:28 +1000 From: Nathan Scott To: Chris Seufert Cc: xfs@oss.sgi.com Subject: Re: Kernel Ooops Message-ID: <20060906083028.I3365803@wobbly.melbourne.sgi.com> References: <2260b150609050427p3123cb85q5af484d8b907e6ac@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <2260b150609050427p3123cb85q5af484d8b907e6ac@mail.gmail.com>; from seufert@gmail.com on Tue, Sep 05, 2006 at 09:27:04PM +1000 X-archive-position: 8889 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: nathans@sgi.com Precedence: bulk X-list: xfs Content-Length: 412 Lines: 16 On Tue, Sep 05, 2006 at 09:27:04PM +1000, Chris Seufert wrote: > I have had this one before, and i had assumed it had been fixed. > > System seems very stable running ext3, so i dont 'think' its hardware > related, but i am begining to wonder. This is fixed, what kernel version are you using? (it was fixed in -rc5/6 IIRC). > RIP [] xfs_btree_init_cursor+0x48/0x1bd cheers. -- Nathan From owner-xfs@oss.sgi.com Tue Sep 5 15:35:59 2006 Received: with ECARTIS (v1.0.0; list xfs); Tue, 05 Sep 2006 15:36:15 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id k85MZkDW022405 for ; Tue, 5 Sep 2006 15:35:58 -0700 Received: from wobbly.melbourne.sgi.com (wobbly.melbourne.sgi.com [134.14.55.135]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id IAA16223; Wed, 6 Sep 2006 08:34:54 +1000 Received: from wobbly.melbourne.sgi.com (localhost [127.0.0.1]) by wobbly.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id k85MYogw3384341; Wed, 6 Sep 2006 08:34:51 +1000 (EST) Received: (from nathans@localhost) by wobbly.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id k85MYmtt3382131; Wed, 6 Sep 2006 08:34:48 +1000 (EST) Date: Wed, 6 Sep 2006 08:34:48 +1000 From: Nathan Scott To: Roger Willcocks Cc: xfs@oss.sgi.com Subject: race in xfs_rename? (fwd) Message-ID: <20060906083448.J3365803@wobbly.melbourne.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i X-archive-position: 8890 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: nathans@xfs.org Precedence: bulk X-list: xfs Content-Length: 1100 Lines: 44 Hi Roger, I'm gonna be rude and fwd your mail to the list - in the hope someone there will be able to help you. I'm running out of time @sgi and have a bunch of stuff still to get done before I skip outta here - having to look at the xfs_rename locking right now might just be enough to make my head explode. ;) cheers. ----- Forwarded message from Roger Willcocks ----- Date: 05 Sep 2006 14:30:30 +0100 To: nathans@sgi.com X-Mailer: Ximian Evolution 1.2.2 (1.2.2-4) From: Roger Willcocks Subject: race in xfs_rename? Hi Nathan, I think I must be missing something here: xfs_rename calls xfs_lock_for_rename, which i-locks the source file and directory, target directory, and (if it already exists) the target file. It returns a two-to-four entry list of participating inodes. xfs_rename unlocks them all, creates a transaction, and then locks them all again. Surely while they're unlocked, another processor could jump in and fiddle with the underlying files and directories? -- Roger ----- End forwarded message ----- -- Nathan From owner-xfs@oss.sgi.com Tue Sep 5 16:15:27 2006 Received: with ECARTIS (v1.0.0; list xfs); Tue, 05 Sep 2006 16:15:47 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id k85NFEDW028040 for ; Tue, 5 Sep 2006 16:15:25 -0700 Received: from wobbly.melbourne.sgi.com (wobbly.melbourne.sgi.com [134.14.55.135]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA16936; Wed, 6 Sep 2006 09:14:18 +1000 Received: from wobbly.melbourne.sgi.com (localhost [127.0.0.1]) by wobbly.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id k85NEDgw3385030; Wed, 6 Sep 2006 09:14:14 +1000 (EST) Received: (from nathans@localhost) by wobbly.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id k85NE8rc3385132; Wed, 6 Sep 2006 09:14:08 +1000 (EST) Date: Wed, 6 Sep 2006 09:14:08 +1000 From: Nathan Scott To: Richard Knutsson Cc: akpm@osdl.org, xfs@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: [xfs-masters] Re: [PATCH 2.6.18-rc4-mm3 2/2] fs/xfs: Converting into generic boolean Message-ID: <20060906091407.M3365803@wobbly.melbourne.sgi.com> References: <44F833C9.1000208@student.ltu.se> <20060904150241.I3335706@wobbly.melbourne.sgi.com> <44FBFEE9.4010201@student.ltu.se> <20060905130557.A3334712@wobbly.melbourne.sgi.com> <44FD71C6.20006@student.ltu.se> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <44FD71C6.20006@student.ltu.se>; from ricknu-0@student.ltu.se on Tue, Sep 05, 2006 at 02:47:02PM +0200 X-archive-position: 8891 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: nathans@sgi.com Precedence: bulk X-list: xfs Content-Length: 1049 Lines: 36 On Tue, Sep 05, 2006 at 02:47:02PM +0200, Richard Knutsson wrote: > Just the notion: "your" guys was the ones to make those to boolean(_t), Sort of, we actually inherited that type from IRIX where it is defined in . > and now you seem to want to patch them away because I tried to make them > more general. Nah, I just don't see the value either way, and see it as another code churn exercise. > So, is the: > B_FALSE -> false > B_TRUE -> true > ok by you? Personally, no. Thats code churn with no value IMO. > >"int needflush;" is just as readable (some would argue moreso) as > >"bool needflush;" and thats pretty much the level of use in XFS - > > > How are you sure "needflush" is, for example, not a counter? Well, that would be named "flushcount" or some such thing. And you would be able to tell that it was a counter by the way its used in the surrounding code. This discussion really isn't going anywhere useful; I think you need to accept that not everyone sees value in a boolean type. :) cheers. -- Nathan From owner-xfs@oss.sgi.com Tue Sep 5 16:31:02 2006 Received: with ECARTIS (v1.0.0; list xfs); Tue, 05 Sep 2006 16:31:17 -0700 (PDT) Received: from wx-out-0506.google.com (wx-out-0506.google.com [66.249.82.229]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id k85NV1DW002138 for ; Tue, 5 Sep 2006 16:31:02 -0700 Received: by wx-out-0506.google.com with SMTP id h29so2348870wxd for ; Tue, 05 Sep 2006 16:30:26 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=T3mdH0vnK4Ux55XJTY2Bw2bGJ1Ab5iexjJSzvGsLOJiw1dtRjhT3rDIRPpB0sCyya2DtkIa1yCELhrL0V6TogKbGASKduJ734+C9+eOdUInSCNcE0u6oFnDUmixF+/927aqxngvVQ0zBTNMcaHZ/971Sy3+pzAEkjdZWEF6xYs8= Received: by 10.70.74.1 with SMTP id w1mr10995316wxa; Tue, 05 Sep 2006 16:30:25 -0700 (PDT) Received: by 10.70.20.10 with HTTP; Tue, 5 Sep 2006 16:30:25 -0700 (PDT) Message-ID: <2260b150609051630w311dcedfgca19fb3e1cd41f95@mail.gmail.com> Date: Wed, 6 Sep 2006 09:30:25 +1000 From: "Chris Seufert" To: "Nathan Scott" Subject: Re: Kernel Ooops Cc: xfs@oss.sgi.com In-Reply-To: <20060906083028.I3365803@wobbly.melbourne.sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <2260b150609050427p3123cb85q5af484d8b907e6ac@mail.gmail.com> <20060906083028.I3365803@wobbly.melbourne.sgi.com> X-archive-position: 8892 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: seufert@gmail.com Precedence: bulk X-list: xfs Content-Length: 582 Lines: 22 Just installed -rc5, all seems well. Should i be running a fsck after these types of errors? On 9/6/06, Nathan Scott wrote: > On Tue, Sep 05, 2006 at 09:27:04PM +1000, Chris Seufert wrote: > > I have had this one before, and i had assumed it had been fixed. > > > > System seems very stable running ext3, so i dont 'think' its hardware > > related, but i am begining to wonder. > > This is fixed, what kernel version are you using? (it was fixed > in -rc5/6 IIRC). > > > RIP [] xfs_btree_init_cursor+0x48/0x1bd > > cheers. > > -- > Nathan > From owner-xfs@oss.sgi.com Tue Sep 5 16:32:41 2006 Received: with ECARTIS (v1.0.0; list xfs); Tue, 05 Sep 2006 16:32:57 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id k85NWRDW002341 for ; Tue, 5 Sep 2006 16:32:39 -0700 Received: from wobbly.melbourne.sgi.com (wobbly.melbourne.sgi.com [134.14.55.135]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA17367; Wed, 6 Sep 2006 09:31:38 +1000 Received: from wobbly.melbourne.sgi.com (localhost [127.0.0.1]) by wobbly.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id k85NVZgw3364836; Wed, 6 Sep 2006 09:31:35 +1000 (EST) Received: (from nathans@localhost) by wobbly.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id k85NVWWP3388392; Wed, 6 Sep 2006 09:31:32 +1000 (EST) Date: Wed, 6 Sep 2006 09:31:32 +1000 From: Nathan Scott To: Chris Seufert Cc: xfs@oss.sgi.com Subject: Re: Kernel Ooops Message-ID: <20060906093132.A3385910@wobbly.melbourne.sgi.com> References: <2260b150609050427p3123cb85q5af484d8b907e6ac@mail.gmail.com> <20060906083028.I3365803@wobbly.melbourne.sgi.com> <2260b150609051630w311dcedfgca19fb3e1cd41f95@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <2260b150609051630w311dcedfgca19fb3e1cd41f95@mail.gmail.com>; from seufert@gmail.com on Wed, Sep 06, 2006 at 09:30:25AM +1000 X-archive-position: 8893 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: nathans@sgi.com Precedence: bulk X-list: xfs Content-Length: 212 Lines: 14 On Wed, Sep 06, 2006 at 09:30:25AM +1000, Chris Seufert wrote: > Just installed -rc5, all seems well. Great. > Should i be running a fsck after these types of errors? Its not needed, no. cheers. -- Nathan From owner-xfs@oss.sgi.com Tue Sep 5 16:41:16 2006 Received: with ECARTIS (v1.0.0; list xfs); Tue, 05 Sep 2006 16:41:25 -0700 (PDT) Received: from wx-out-0506.google.com (wx-out-0506.google.com [66.249.82.236]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id k85NfFDW004087 for ; Tue, 5 Sep 2006 16:41:15 -0700 Received: by wx-out-0506.google.com with SMTP id h29so2351472wxd for ; Tue, 05 Sep 2006 16:40:41 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=FAT7voeByA4avheKvdU2pG6/KUPHIWwVW7uk8qbZMhN097c1H0VvZJk9RiyJTxCf0N34nZUAKZ1QsUSzal9Jw8Zw3KlF4IXuO7Vi3lfOGFIVxi6ACb7/KJnR7wB9WZG05ONWBKaB5dplSipZcRuF3ZN097EihEkOab9Vr/HPIbw= Received: by 10.70.38.19 with SMTP id l19mr10859288wxl; Tue, 05 Sep 2006 16:40:41 -0700 (PDT) Received: by 10.70.20.10 with HTTP; Tue, 5 Sep 2006 16:40:41 -0700 (PDT) Message-ID: <2260b150609051640y288629cbtcbc133d05b2b40dd@mail.gmail.com> Date: Wed, 6 Sep 2006 09:40:41 +1000 From: "Chris Seufert" To: xfs@oss.sgi.com Subject: XFS Journal on md device MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 8894 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: seufert@gmail.com Precedence: bulk X-list: xfs Content-Length: 555 Lines: 16 Hey, I currently have a 2.1Tb xfs partition, sitting on a hardware SATA raid card. However i also have 2 hdd's (for OS etc) in software raid1, with a md device for the xfs log file. Its a 100mb RAID1 (under /dev/md4), now when i halt/reboot the box, even after the xfs partition is unmounted (as part of the shutdown sequence as normal on debian etch) the /dev/md4 device cant be cleanly stopped. Is having the log on a redundant partition a good idea or is it better to leave it as an internal log or is there another way round this problem. -Chris From owner-xfs@oss.sgi.com Tue Sep 5 17:16:59 2006 Received: with ECARTIS (v1.0.0; list xfs); Tue, 05 Sep 2006 17:17:16 -0700 (PDT) Received: from gepetto.dc.ltu.se (gepetto.dc.ltu.se [130.240.42.40]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id k860GtDW008712 for ; Tue, 5 Sep 2006 17:16:59 -0700 Received: from [130.240.205.31] (thinktank.campus.luth.se [130.240.205.31]) by gepetto.dc.ltu.se (8.12.5/8.12.5) with ESMTP id k860GBp9024412; Wed, 6 Sep 2006 02:16:11 +0200 (MEST) Message-ID: <44FE14ED.3020605@student.ltu.se> Date: Wed, 06 Sep 2006 02:23:09 +0200 From: Richard Knutsson User-Agent: Mozilla Thunderbird 1.0.8-1.1.fc4 (X11/20060501) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Nathan Scott CC: akpm@osdl.org, xfs@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: [xfs-masters] Re: [PATCH 2.6.18-rc4-mm3 2/2] fs/xfs: Converting into generic boolean References: <44F833C9.1000208@student.ltu.se> <20060904150241.I3335706@wobbly.melbourne.sgi.com> <44FBFEE9.4010201@student.ltu.se> <20060905130557.A3334712@wobbly.melbourne.sgi.com> <44FD71C6.20006@student.ltu.se> <20060906091407.M3365803@wobbly.melbourne.sgi.com> In-Reply-To: <20060906091407.M3365803@wobbly.melbourne.sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 8895 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: ricknu-0@student.ltu.se Precedence: bulk X-list: xfs Content-Length: 1148 Lines: 45 Nathan Scott wrote: >On Tue, Sep 05, 2006 at 02:47:02PM +0200, Richard Knutsson wrote: > > >>Just the notion: "your" guys was the ones to make those to boolean(_t), >> >> > >Sort of, we actually inherited that type from IRIX where it is >defined in . > > Oh, ok >>>"int needflush;" is just as readable (some would argue moreso) as >>>"bool needflush;" and thats pretty much the level of use in XFS - >>> >>> >>> >>How are you sure "needflush" is, for example, not a counter? >> >> > >Well, that would be named "flushcount" or some such thing. And you >would be able to tell that it was a counter by the way its used in >the surrounding code. > > True, thinking more of when you have a quick look at the headers, but "flushcount" would be a more logical name in such a case. >This discussion really isn't going anywhere useful; I think you need >to accept that not everyone sees value in a boolean type. :) > > Well, can you blame me for trying? ;) But the more important thing is to clean up the boolean-type and FALSE/TRUE mess in the kernel. >cheers. > > Thank you for your time and happy coding :) From owner-xfs@oss.sgi.com Tue Sep 5 19:32:35 2006 Received: with ECARTIS (v1.0.0; list xfs); Tue, 05 Sep 2006 19:32:56 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id k862WMDW026638 for ; Tue, 5 Sep 2006 19:32:34 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA20848; Wed, 6 Sep 2006 12:31:32 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id k862VTeQ12699673; Wed, 6 Sep 2006 12:31:30 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id k862VSQT12697813; Wed, 6 Sep 2006 12:31:28 +1000 (AEST) Date: Wed, 6 Sep 2006 12:31:28 +1000 From: David Chinner To: Nathan Scott Cc: Roger Willcocks , xfs@oss.sgi.com Subject: Re: race in xfs_rename? (fwd) Message-ID: <20060906023128.GN10950339@melbourne.sgi.com> References: <20060906083448.J3365803@wobbly.melbourne.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20060906083448.J3365803@wobbly.melbourne.sgi.com> User-Agent: Mutt/1.4.2.1i X-archive-position: 8896 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Content-Length: 1763 Lines: 52 On Wed, Sep 06, 2006 at 08:34:48AM +1000, Nathan Scott wrote: > Hi Roger, > > I'm gonna be rude and fwd your mail to the list - in the hope > someone there will be able to help you. I'm running out of time > @sgi and have a bunch of stuff still to get done before I skip > outta here - having to look at the xfs_rename locking right now > might just be enough to make my head explode. ;) > > cheers. > > ----- Forwarded message from Roger Willcocks ----- > > Date: 05 Sep 2006 14:30:30 +0100 > To: nathans@sgi.com > X-Mailer: Ximian Evolution 1.2.2 (1.2.2-4) > From: Roger Willcocks > Subject: race in xfs_rename? > > Hi Nathan, > > I think I must be missing something here: > > xfs_rename calls xfs_lock_for_rename, which i-locks the source file and > directory, target directory, and (if it already exists) the target file. > > It returns a two-to-four entry list of participating inodes. > > xfs_rename unlocks them all, creates a transaction, and then locks them > all again. > > Surely while they're unlocked, another processor could jump in and > fiddle with the underlying files and directories? I don't think that can happen due to i_mutex locking at the vfs layer i.e. in do_rename() via lock_rename() and in vfs_rename_{dir,other}(). Hence I think it is safe for XFS to do what it does. FWIW, in Irix where there is no higher layer locking, XFS has extra checks and locks (ancestor lock, inode generation count checks, etc) to ensure nothing changed when the locks were dropped and regained. AFAICT, the Linux XFS code doesn't need to do of this because the VFS guarantees us that things won't change..... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Tue Sep 5 20:54:09 2006 Received: with ECARTIS (v1.0.0; list xfs); Tue, 05 Sep 2006 20:54:15 -0700 (PDT) Received: from wx-out-0506.google.com (wx-out-0506.google.com [66.249.82.225]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id k863s8DW011764 for ; Tue, 5 Sep 2006 20:54:09 -0700 Received: by wx-out-0506.google.com with SMTP id h29so2415237wxd for ; Tue, 05 Sep 2006 20:53:31 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=DDjzkCJ/z0xetBUkGjnK+BHb2XkT9qVKNT0kTbHTLNRq/WHSd3+JJ9+wZQKRAhXLPRc/yS48kaXW68K+V1EZybT1qT7RkvBak2icRQJ+k4SQN1NrCQ6ysDCrF2rXSWZrB3QZnh8oABee/ZvByUspbp4u3pp8SOR2LBAJHSnWJNA= Received: by 10.70.99.11 with SMTP id w11mr11199434wxb; Tue, 05 Sep 2006 20:53:31 -0700 (PDT) Received: by 10.70.20.10 with HTTP; Tue, 5 Sep 2006 20:53:31 -0700 (PDT) Message-ID: <2260b150609052053h31731a0eycababfab603749c9@mail.gmail.com> Date: Wed, 6 Sep 2006 13:53:31 +1000 From: "Chris Seufert" To: "Chris Seufert" , "linux-xfs@oss.sgi.com" Subject: Re: XFS Journal on md device In-Reply-To: <20060906034027.GA7393@piper.madduck.net> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <2260b150609051640y288629cbtcbc133d05b2b40dd@mail.gmail.com> <44FE28A8.5000803@oss.sgi.com> <2260b150609051853w1286eda7ve59a5df2c7e0ae1c@mail.gmail.com> <20060906034027.GA7393@piper.madduck.net> X-archive-position: 8900 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: seufert@gmail.com Precedence: bulk X-list: xfs Content-Length: 1416 Lines: 45 I'm running Debian Testing (is that Etch), updated about a week ago, on AMD64, with Kernel 2.6.18-rc5-mm1 + md hotfix(1) with md built-in, no initrd, so raid autodetect works. My root (/) volume is ext3 running on /dev/md0 (RAID 1), the problem is with my /data volume thats xfs, running on /dev/sda, with log on /dev/md4. 1: The patch is required becase mm1 killed the KConfig for md devices. On 9/6/06, martin f krafft wrote: > also sprach Chris Seufert [2006.09.06.0353 +0200]: > > However on reboot xfs does a journal rebuild/repair. and the md > > does a re-sync of the md device. > > Which distro? > > I am the Debian maintainer for mdadm and have run into the problem > that the array used for / cannot be stopped until after / is > unmounted, at which point nothing stops the array for there is no > shutdownramfs. > > However, we (Debian) remount / read-only and I never see > a filesystem check on reboot. > > -- > martin; (greetings from the heart of the sun.) > \____ echo mailto: !#^."<*>"|tr "<*> mailto:" net@madduck > > spamtraps: madduck.bogus@madduck.net > > the micro$oft hoover: finally, a product that's supposed to suck! > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.5 (GNU/Linux) > > iD8DBQFE/kMrIgvIgzMMSnURAoSgAJ4mQ8a1RH6sYd7VRn4yZsRNKxbeSACdEGFv > HeWvLK1N+R1nvxMfeqlDZk8= > =LyfV > -----END PGP SIGNATURE----- > > > From owner-xfs@oss.sgi.com Tue Sep 5 22:12:09 2006 Received: with ECARTIS (v1.0.0; list xfs); Tue, 05 Sep 2006 22:12:20 -0700 (PDT) Received: from albatross.madduck.net (armagnac.ifi.unizh.ch [130.60.75.72]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id k865C7DW022292 for ; Tue, 5 Sep 2006 22:12:08 -0700 Received: from localhost (albatross.madduck.net [127.0.0.1]) by albatross.madduck.net (postfix) with ESMTP id 72800895D7C for ; Wed, 6 Sep 2006 06:12:39 +0200 (CEST) Received: from albatross.madduck.net ([127.0.0.1]) by localhost (albatross.madduck.net [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 16384-01 for ; Wed, 6 Sep 2006 06:12:39 +0200 (CEST) Received: from wall.oerlikon.madduck.net (84-72-21-226.dclient.hispeed.ch [84.72.21.226]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "wall.oerlikon.madduck.net", Issuer "CAcert Class 3 Root" (verified OK)) by albatross.madduck.net (postfix) with ESMTP id 2EF29895D79 for ; Wed, 6 Sep 2006 06:12:39 +0200 (CEST) Received: from piper.oerlikon.madduck.net (piper.oerlikon.madduck.net [192.168.14.3]) by wall.oerlikon.madduck.net (Postfix) with ESMTP id 911761804BBE for ; Wed, 6 Sep 2006 06:12:45 +0200 (CEST) Received: by piper.oerlikon.madduck.net (Postfix, from userid 1000) id 281D21043E50; Wed, 6 Sep 2006 06:12:45 +0200 (CEST) Date: Wed, 6 Sep 2006 06:12:45 +0200 From: martin f krafft To: "linux-xfs@oss.sgi.com" Subject: Re: XFS Journal on md device Message-ID: <20060906041245.GA10066@piper.madduck.net> Mail-Followup-To: "linux-xfs@oss.sgi.com" References: <2260b150609051640y288629cbtcbc133d05b2b40dd@mail.gmail.com> <44FE28A8.5000803@oss.sgi.com> <2260b150609051853w1286eda7ve59a5df2c7e0ae1c@mail.gmail.com> <20060906034027.GA7393@piper.madduck.net> <2260b150609052053h31731a0eycababfab603749c9@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ew6BAiZeqk4r7MaW" Content-Disposition: inline In-Reply-To: <2260b150609052053h31731a0eycababfab603749c9@mail.gmail.com> X-OS: Debian GNU/Linux testing/unstable kernel 2.6.17-2-amd64 x86_64 X-Motto: Keep the good times rollin' X-Subliminal-Message: debian/rules! X-Spamtrap: madduck.bogus@madduck.net User-Agent: Mutt/1.5.13 (2006-08-11) X-archive-position: 8901 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: madduck@madduck.net Precedence: bulk X-list: xfs Content-Length: 1595 Lines: 56 --ew6BAiZeqk4r7MaW Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable also sprach Chris Seufert [2006.09.06.0553 +0200]: > My root (/) volume is ext3 running on /dev/md0 (RAID 1), the > problem is with my /data volume thats xfs, running on /dev/sda, > with log on /dev/md4. Can you tell when /data gets umounted during the shutdown sequence? Correct me if I'm wrong, but once that happened, /dev/md4 should become free as far as XFS is concerned. However, since mdadm or the kernel fails to stop the device during shutdown, I am guessing that the partition is simply not being umounted. Does it have an entry in /etc/fstab? Try changing the=20 #! /bin/sh in line 1 of /etc/rc6.d/S40umountfs to #! /bin/sh -x exec 2> /root/umountfs.out.2 then reboot and paste that file somewhere (http://rafb.net/paste). --=20 martin; (greetings from the heart of the sun.) \____ echo mailto: !#^."<*>"|tr "<*> mailto:" net@madduck =20 spamtraps: madduck.bogus@madduck.net =20 "you don't sew with a fork, so I see no reason to eat with knitting needles." -- miss piggy, on eating chinese food --ew6BAiZeqk4r7MaW Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature (GPG/PGP) Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFE/kq9IgvIgzMMSnURAiPcAJ99QMFVoCCU3qG+cTDtAP7wtvm3dQCffFeW 1nM/lomxGvDkQSmGbdgBKRI= =fBYK -----END PGP SIGNATURE----- --ew6BAiZeqk4r7MaW-- From owner-xfs@oss.sgi.com Tue Sep 5 22:45:33 2006 Received: with ECARTIS (v1.0.0; list xfs); Tue, 05 Sep 2006 22:45:37 -0700 (PDT) Received: from albatross.madduck.net (armagnac.ifi.unizh.ch [130.60.75.72]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id k865jRDW027334 for ; Tue, 5 Sep 2006 22:45:32 -0700 Received: from localhost (albatross.madduck.net [127.0.0.1]) by albatross.madduck.net (postfix) with ESMTP id 28DE0895D7A; Wed, 6 Sep 2006 05:40:22 +0200 (CEST) Received: from albatross.madduck.net ([127.0.0.1]) by localhost (albatross.madduck.net [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 08528-02; Wed, 6 Sep 2006 05:40:21 +0200 (CEST) Received: from wall.oerlikon.madduck.net (84-72-21-226.dclient.hispeed.ch [84.72.21.226]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "wall.oerlikon.madduck.net", Issuer "CAcert Class 3 Root" (verified OK)) by albatross.madduck.net (postfix) with ESMTP id D8A71895D79; Wed, 6 Sep 2006 05:40:21 +0200 (CEST) Received: from piper.oerlikon.madduck.net (piper.oerlikon.madduck.net [192.168.14.3]) by wall.oerlikon.madduck.net (Postfix) with ESMTP id 336F61804B98; Wed, 6 Sep 2006 05:40:28 +0200 (CEST) Received: by piper.oerlikon.madduck.net (Postfix, from userid 1000) id CC0BC1043E50; Wed, 6 Sep 2006 05:40:27 +0200 (CEST) Date: Wed, 6 Sep 2006 05:40:27 +0200 From: martin f krafft To: Chris Seufert Cc: "linux-xfs@oss.sgi.com" Subject: Re: XFS Journal on md device Message-ID: <20060906034027.GA7393@piper.madduck.net> Mail-Followup-To: Chris Seufert , "linux-xfs@oss.sgi.com" References: <2260b150609051640y288629cbtcbc133d05b2b40dd@mail.gmail.com> <44FE28A8.5000803@oss.sgi.com> <2260b150609051853w1286eda7ve59a5df2c7e0ae1c@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="rwEMma7ioTxnRzrJ" Content-Disposition: inline In-Reply-To: <2260b150609051853w1286eda7ve59a5df2c7e0ae1c@mail.gmail.com> X-OS: Debian GNU/Linux testing/unstable kernel 2.6.17-2-amd64 x86_64 X-Motto: Keep the good times rollin' X-Subliminal-Message: debian/rules! X-Spamtrap: madduck.bogus@madduck.net User-Agent: Mutt/1.5.13 (2006-08-11) X-archive-position: 8902 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: madduck@madduck.net Precedence: bulk X-list: xfs Content-Length: 1230 Lines: 42 --rwEMma7ioTxnRzrJ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable also sprach Chris Seufert [2006.09.06.0353 +0200]: > However on reboot xfs does a journal rebuild/repair. and the md > does a re-sync of the md device. Which distro? I am the Debian maintainer for mdadm and have run into the problem that the array used for / cannot be stopped until after / is unmounted, at which point nothing stops the array for there is no shutdownramfs. However, we (Debian) remount / read-only and I never see a filesystem check on reboot. --=20 martin; (greetings from the heart of the sun.) \____ echo mailto: !#^."<*>"|tr "<*> mailto:" net@madduck =20 spamtraps: madduck.bogus@madduck.net =20 the micro$oft hoover: finally, a product that's supposed to suck! --rwEMma7ioTxnRzrJ Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature (GPG/PGP) Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFE/kMrIgvIgzMMSnURAoSgAJ4mQ8a1RH6sYd7VRn4yZsRNKxbeSACdEGFv HeWvLK1N+R1nvxMfeqlDZk8= =LyfV -----END PGP SIGNATURE----- --rwEMma7ioTxnRzrJ-- From owner-xfs@oss.sgi.com Wed Sep 6 05:58:11 2006 Received: with ECARTIS (v1.0.0; list xfs); Wed, 06 Sep 2006 05:58:21 -0700 (PDT) Received: from ug-out-1314.google.com (ug-out-1314.google.com [66.249.92.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id k86CwADW002687 for ; Wed, 6 Sep 2006 05:58:11 -0700 Received: by ug-out-1314.google.com with SMTP id j3so2316337ugf for ; Wed, 06 Sep 2006 05:57:36 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=JiCwpuJ2Hifh+CsDCWsT5QpZ72EsH9uRCgasR1a/t1XMnpEt/mvX5RC3wUu3j+zCflFTmlVqhDTpCByNruQ8pI69agToeVy3NCELmOIW48MJLg+oEUJqGfG70zK5v1iclg7IwwyMFQp9L/M30qCaOfdIH1vpnaxBICAT4+VQIWU= Received: by 10.66.216.20 with SMTP id o20mr4320806ugg; Wed, 06 Sep 2006 04:59:24 -0700 (PDT) Received: by 10.67.23.8 with HTTP; Wed, 6 Sep 2006 04:59:24 -0700 (PDT) Message-ID: <60fdb1ad0609060459k6132f8b8s40e4f20f51a746ed@mail.gmail.com> Date: Wed, 6 Sep 2006 12:59:24 +0100 From: "Vijay Gill" To: xfs@oss.sgi.com Subject: Bad block on partition, how to deal with it? MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 8905 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: vijay.s.gill@gmail.com Precedence: bulk X-list: xfs Content-Length: 515 Lines: 18 Hi, Got this bad sector in a seagate 40G hard disk. Is there any tool under linux to scan the surface of the disk and mark the sectors bad in file system (or at even lower level like seatools does)? Running Linux Fedora Core 5. In the mean while I am doing a dd on that partition to copy the data and try to recover it from there. Also I have run badblocks to get the number of the block which is bad, but how do I get it marked now so that the OS does not try to allocate it for data in future. Thanks Vijay From owner-xfs@oss.sgi.com Wed Sep 6 07:23:30 2006 Received: with ECARTIS (v1.0.0; list xfs); Wed, 06 Sep 2006 07:23:40 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [66.187.233.31]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id k86ENSDW015430 for ; Wed, 6 Sep 2006 07:23:30 -0700 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.12.11.20060308/8.12.11) with ESMTP id k86EMjM2004086; Wed, 6 Sep 2006 10:22:45 -0400 Received: from pobox-2.corp.redhat.com (pobox-2.corp.redhat.com [10.11.255.15]) by int-mx1.corp.redhat.com (8.12.11.20060308/8.12.11) with ESMTP id k86EMiPA027491; Wed, 6 Sep 2006 10:22:44 -0400 Received: from [10.15.80.10] (neon.msp.redhat.com [10.15.80.10]) by pobox-2.corp.redhat.com (8.13.1/8.13.1) with ESMTP id k86EMi7r020600; Wed, 6 Sep 2006 10:22:44 -0400 Message-ID: <44FED9B3.5080308@sandeen.net> Date: Wed, 06 Sep 2006 09:22:43 -0500 From: Eric Sandeen User-Agent: Thunderbird 1.5.0.5 (X11/20060808) MIME-Version: 1.0 To: Vijay Gill CC: xfs@oss.sgi.com Subject: Re: Bad block on partition, how to deal with it? References: <60fdb1ad0609060459k6132f8b8s40e4f20f51a746ed@mail.gmail.com> In-Reply-To: <60fdb1ad0609060459k6132f8b8s40e4f20f51a746ed@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 8907 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: sandeen@sandeen.net Precedence: bulk X-list: xfs Content-Length: 954 Lines: 29 Vijay Gill wrote: > Hi, > > Got this bad sector in a seagate 40G hard disk. You should buy a new disk for $20 or so :) > Is there any tool > under linux to scan the surface of the disk and mark the sectors bad > in file system (or at even lower level like seatools does)? xfs has no badblocks support. If you can convince the drive to remap the block with vendor tools then maybe it's ok. But modern drives remap on their own; if you have a block that can't be remapped then your drive is probably not long for this world. Don't try to keep using it. > Running Linux Fedora Core 5. > > In the mean while I am doing a dd on that partition to copy the data > and try to recover it from there. > > Also I have run badblocks to get the number of the block which is bad, > but how do I get it marked now so that the OS does not try to allocate > it for data in future. With xfs, you don't. It's not worth it IMHO, just get a new disk. -Eric From owner-xfs@oss.sgi.com Wed Sep 6 10:23:51 2006 Received: with ECARTIS (v1.0.0; list xfs); Wed, 06 Sep 2006 10:24:05 -0700 (PDT) Received: from smtp102.sbc.mail.mud.yahoo.com (smtp102.sbc.mail.mud.yahoo.com [68.142.198.201]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id k86HNiDW013417 for ; Wed, 6 Sep 2006 10:23:51 -0700 Received: (qmail 70801 invoked from network); 6 Sep 2006 17:23:08 -0000 Received: from unknown (HELO stupidest.org) (cwedgwood@sbcglobal.net@71.202.63.228 with login) by smtp102.sbc.mail.mud.yahoo.com with SMTP; 6 Sep 2006 17:23:08 -0000 Received: by tuatara.stupidest.org (Postfix, from userid 10000) id CB8531814338; Wed, 6 Sep 2006 10:23:06 -0700 (PDT) Date: Wed, 6 Sep 2006 10:23:06 -0700 From: Chris Wedgwood To: Vijay Gill Cc: xfs@oss.sgi.com Subject: Re: Bad block on partition, how to deal with it? Message-ID: <20060906172306.GA19108@tuatara.stupidest.org> References: <60fdb1ad0609060459k6132f8b8s40e4f20f51a746ed@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <60fdb1ad0609060459k6132f8b8s40e4f20f51a746ed@mail.gmail.com> X-archive-position: 8908 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: cw@f00f.org Precedence: bulk X-list: xfs Content-Length: 732 Lines: 20 On Wed, Sep 06, 2006 at 12:59:24PM +0100, Vijay Gill wrote: > Got this bad sector in a seagate 40G hard disk. recorder where it is, dd over it and hopefully the drive will remap it (if there are many sectors the drive is probably toast) if you know which block is/was bad you can user xfs_bmap to figure out which file it was in > Also I have run badblocks to get the number of the block which is > bad, but how do I get it marked now so that the OS does not try to > allocate it for data in future. modern drivers (pretty much anything less than 10 years old) will remap bad sectors on writes, if they fail to do this get a new drive smartctl will usually let you get a count of how many times the drive has done this since From owner-xfs@oss.sgi.com Wed Sep 6 16:04:14 2006 Received: with ECARTIS (v1.0.0; list xfs); Wed, 06 Sep 2006 16:04:40 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id k86N3xDW026696 for ; Wed, 6 Sep 2006 16:04:12 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA16212; Thu, 7 Sep 2006 09:03:07 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id k86N2feQ13557459; Thu, 7 Sep 2006 09:02:42 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id k86N2cTC13528081; Thu, 7 Sep 2006 09:02:38 +1000 (AEST) Date: Thu, 7 Sep 2006 09:02:38 +1000 From: David Chinner To: Jesper Juhl Cc: Linux Kernel Mailing List , xfs@oss.sgi.com Subject: Re: Wrong free space reported for XFS filesystem Message-ID: <20060906230238.GJ5737019@melbourne.sgi.com> References: <9a8748490609060154ye8730b0n16e23524010a35e4@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9a8748490609060154ye8730b0n16e23524010a35e4@mail.gmail.com> User-Agent: Mutt/1.4.2.1i X-archive-position: 8910 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Content-Length: 1209 Lines: 40 On Wed, Sep 06, 2006 at 10:54:34AM +0200, Jesper Juhl wrote: > For your information; > > I've been running a bunch of benchmarks on a 250GB XFS filesystem. > After the benchmarks had run for a few hours and almost filled up the > fs, I removed all the files and did a "df -h" with interresting > results : > > /dev/mapper/Data1-test > 250G -64Z 251G 101% /mnt/test > > "df -k" reported this : > > /dev/mapper/Data1-test > 262144000 -73786976294838202960 262147504 101% /mnt/test .... > The filesystem is mounted like this : > > /dev/mapper/Data1-test on /mnt/test type xfs > (rw,noatime,ihashsize=64433,logdev=/dev/Log1/test_log,usrquota) So the in-core accounting has underflowed by a small amount but the on disk accounting is correct. We've had a few reports of this that I know of over the past couple of years, but we've never managed to find a reproducable test case for it. Can you describe what benchmark you were runnin, wht kernel you were using and whether any of the tests hit an ENOSPC condition? Also, in future can you cc xfs@oss.sgi.com on XFS bug reports? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Wed Sep 6 21:10:15 2006 Received: with ECARTIS (v1.0.0; list xfs); Wed, 06 Sep 2006 21:10:31 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id k8749xDW013700 for ; Wed, 6 Sep 2006 21:10:12 -0700 Received: from chook.melbourne.sgi.com (chook.melbourne.sgi.com [134.14.54.237]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA22746; Thu, 7 Sep 2006 14:09:08 +1000 Received: by chook.melbourne.sgi.com (Postfix, from userid 16302) id 2DDBF58CF851; Thu, 7 Sep 2006 14:09:08 +1000 (EST) To: linux-xfs@oss.sgi.com, sgi.bugs.xfs@engr.sgi.com Subject: TAKE 955993 - quota oops fix Message-Id: <20060907040908.2DDBF58CF851@chook.melbourne.sgi.com> Date: Thu, 7 Sep 2006 14:09:08 +1000 (EST) From: nathans@sgi.com (Nathan Scott) X-archive-position: 8913 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: nathans@sgi.com Precedence: bulk X-list: xfs Content-Length: 467 Lines: 14 Fix a bad pointer dereference in the quota statvfs handling. Date: Thu Sep 7 14:08:44 AEST 2006 Workarea: chook.melbourne.sgi.com:/build/nathans/xfs-linux Inspected by: dgc The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/xfs-kern/xfs-linux-melb Modid: xfs-linux-melb:xfs-kern:26934a quota/xfs_qm_bhv.c - 1.23 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/quota/xfs_qm_bhv.c.diff?r1=text&tr1=1.23&r2=text&tr2=1.22&f=h From owner-xfs@oss.sgi.com Wed Sep 6 23:52:26 2006 Received: with ECARTIS (v1.0.0; list xfs); Wed, 06 Sep 2006 23:52:40 -0700 (PDT) Received: from omx1.americas.sgi.com (omx1.americas.sgi.com [198.149.16.13]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id k876qKDW006399 for ; Wed, 6 Sep 2006 23:52:25 -0700 Received: from internal-mail-relay1.corp.sgi.com (internal-mail-relay1.corp.sgi.com [198.149.32.52]) by omx1.americas.sgi.com (8.12.10/8.12.9/linux-outbound_gateway-1.1) with ESMTP id k875rUnx001062 for ; Thu, 7 Sep 2006 00:53:30 -0500 Received: from omx2.sgi.com ([198.149.32.25]) by internal-mail-relay1.corp.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id k875r98s37777792 for ; Wed, 6 Sep 2006 22:53:09 -0700 (PDT) Received: from outhouse.melbourne.sgi.com (outhouse.melbourne.sgi.com [134.14.52.145]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id k878QdFu017978; Thu, 7 Sep 2006 01:26:40 -0700 Received: from [134.14.55.232] (chatz.melbourne.sgi.com [134.14.55.232]) by outhouse.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id k875t22G15375408; Thu, 7 Sep 2006 15:55:03 +1000 (AEST) Message-ID: <44FFB39C.3000700@melbourne.sgi.com> Date: Thu, 07 Sep 2006 15:52:28 +1000 From: David Chatterton Reply-To: chatz@melbourne.sgi.com Organization: SGI User-Agent: Thunderbird 1.5.0.5 (Windows/20060719) MIME-Version: 1.0 To: torvalds@osdl.org CC: akpm@osdl.org, xfs@oss.sgi.com, axboe@kernel.dk Subject: XFS update for 2.6.18-rc6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-archive-position: 8914 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: chatz@melbourne.sgi.com Precedence: bulk X-list: xfs Content-Length: 3500 Lines: 100 Hi Linus, Please pull from: git://oss.sgi.com:8090/xfs/xfs-2.6 This will update the following files: fs/xfs/linux-2.6/xfs_aops.c | 18 +++++++++++++----- fs/xfs/linux-2.6/xfs_lrw.c | 27 ++++++++++++++++++++++----- fs/xfs/quota/xfs_qm_bhv.c | 2 +- fs/xfs/xfs_alloc.h | 20 ++++++++++++++++++++ fs/xfs/xfs_fsops.c | 16 ++++++++++------ fs/xfs/xfs_mount.c | 32 ++++++++------------------------ fs/xfs/xfs_vfsops.c | 3 ++- 7 files changed, 76 insertions(+), 42 deletions(-) through these commits: commit 4be536debe3f7b0c62283e77fd6bd8bdb9f83c6f Author: David Chinner Date: Thu Sep 7 14:26:50 2006 +1000 [XFS] Prevent free space oversubscription and xfssyncd looping. The fix for recent ENOSPC deadlocks introduced certain limitations on allocations. The fix could cause xfssyncd to loop endlessly if we did not leave some space free for the allocator to work correctly. Basically, we needed to ensure that we had at least 4 blocks free for an AG free list and a block for the inode bmap btree at all times. However, this did not take into account the fact that each AG has a free list that needs 4 blocks. Hence any filesystem with more than one AG could cause oversubscription of free space and make xfssyncd spin forever trying to allocate space needed for AG freelists that was not available in the AG. The following patch reserves space for the free lists in all AGs plus the inode bmap btree which prevents oversubscription. It also prevents those blocks from being reported as free space (as they can never be used) and makes the SMP in-core superblock accounting code and the reserved block ioctl respect this requirement. SGI-PV: 955674 SGI-Modid: xfs-linux-melb:xfs-kern:26894a Signed-off-by: David Chinner Signed-off-by: David Chatterton commit 721259bce2851893155c6cb88a3f8ecb106b348c Author: Lachlan McIlroy Date: Thu Sep 7 14:27:05 2006 +1000 [XFS] Fix ABBA deadlock between i_mutex and iolock. Avoid calling __blockdev_direct_IO for the DIO_OWN_LOCKING case for direct I/O reads since it drops and reacquires the i_mutex while holding the iolock and this violates the locking order. SGI-PV: 955696 SGI-Modid: xfs-linux-melb:xfs-kern:26898a Signed-off-by: Lachlan McIlroy Signed-off-by: David Chatterton commit 0a8d17d090a4939643a52194b7d4a4001b9b2d93 Author: David Chinner Date: Thu Sep 7 14:27:15 2006 +1000 [XFS] Fix xfs_splice_write() so appended data gets to disk. xfs_splice_write() failed to update the on disk inode size when extending the so when the file was closed the range extended by splice was truncated off. Hence any region of a file written to by splice would end up as a hole full of zeros. SGI-PV: 955939 SGI-Modid: xfs-linux-melb:xfs-kern:26920a Signed-off-by: David Chinner Signed-off-by: David Chatterton commit 0edc7d0f3709e8c3bb7e69c4df614218a753361e Author: Nathan Scott Date: Thu Sep 7 14:27:23 2006 +1000 [XFS] Fix a bad pointer dereference in the quota statvfs handling. SGI-PV: 955993 SGI-Modid: xfs-linux-melb:xfs-kern:26934a Signed-off-by: Nathan Scott Signed-off-by: David Chatterton Thanks, David From owner-xfs@oss.sgi.com Thu Sep 7 19:34:49 2006 Received: with ECARTIS (v1.0.0; list xfs); Thu, 07 Sep 2006 19:35:09 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id k882YaDW006722 for ; Thu, 7 Sep 2006 19:34:47 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA19450; Fri, 8 Sep 2006 12:33:44 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id k882XfeQ14414568; Fri, 8 Sep 2006 12:33:42 +1000 (AEST) Received: (from dgc@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) id k882XdAu14403235; Fri, 8 Sep 2006 12:33:39 +1000 (AEST) Date: Fri, 8 Sep 2006 12:33:39 +1000 From: David Chinner To: Jesper Juhl Cc: Linux Kernel Mailing List , xfs@oss.sgi.com Subject: Re: Wrong free space reported for XFS filesystem Message-ID: <20060908023339.GF10950339@melbourne.sgi.com> References: <9a8748490609060154ye8730b0n16e23524010a35e4@mail.gmail.com> <20060906230238.GJ5737019@melbourne.sgi.com> <9a8748490609070717q6ed9111ckdc3de025dc44938b@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9a8748490609070717q6ed9111ckdc3de025dc44938b@mail.gmail.com> User-Agent: Mutt/1.4.2.1i X-archive-position: 8922 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: dgc@sgi.com Precedence: bulk X-list: xfs Content-Length: 1367 Lines: 46 On Thu, Sep 07, 2006 at 04:17:53PM +0200, Jesper Juhl wrote: > On 07/09/06, David Chinner wrote: > >On Wed, Sep 06, 2006 at 10:54:34AM +0200, Jesper Juhl wrote: > >> For your information; > >> > >> I've been running a bunch of benchmarks on a 250GB XFS filesystem. > >> After the benchmarks had run for a few hours and almost filled up the > >> fs, I removed all the files and did a "df -h" with interresting > >> results : ..... > >So the in-core accounting has underflowed by a small amount but the > >on disk accounting is correct. > > > >We've had a few reports of this that I know of over the past > >couple of years, but we've never managed to find a reproducable > >test case for it. > >Can you describe what benchmark you were runnin, wht kernel you were > >using > > The kernel is 2.6.18-rc6 SMP Ok, so it's a current problem.... > >and whether any of the tests hit an ENOSPC condition? > > > That I don't know. > > The script I was running is this one : That doesn't really narrow down the scope at all. All that script tells me is that problem is somewhere inside XFS.... :/ Can you try to isolate which of the loads is causing the problem? That being said, this looks like a good stress load - I'll pass it onto our QA folks... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group From owner-xfs@oss.sgi.com Fri Sep 8 08:05:24 2006 Received: with ECARTIS (v1.0.0; list xfs); Fri, 08 Sep 2006 08:05:45 -0700 (PDT) Received: from a.mx.filmlight.ltd.uk (host217-40-27-25.in-addr.btopenworld.com [217.40.27.25]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id k88F5LDW005483 for ; Fri, 8 Sep 2006 08:05:24 -0700 Received: (qmail 31358 invoked from network); 8 Sep 2006 14:04:43 -0000 Received: from orthia.filmlight.ltd.uk (10.44.0.109) by a.mx.filmlight.ltd.uk with SMTP; 8 Sep 2006 14:04:43 -0000 Subject: Re: race in xfs_rename? (fwd) From: Roger Willcocks To: David Chinner Cc: Nathan Scott , xfs@oss.sgi.com In-Reply-To: <20060906023128.GN10950339@melbourne.sgi.com> References: <20060906083448.J3365803@wobbly.melbourne.sgi.com> <20060906023128.GN10950339@melbourne.sgi.com> Content-Type: text/plain Organization: Message-Id: <1157724365.873.71.camel@orthia.filmlight.ltd.uk> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-4) Date: 08 Sep 2006 15:06:05 +0100 Content-Transfer-Encoding: 7bit X-archive-position: 8923 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: roger@filmlight.ltd.uk Precedence: bulk X-list: xfs Content-Length: 2483 Lines: 68 On Wed, 2006-09-06 at 03:31, David Chinner wrote: > On Wed, Sep 06, 2006 at 08:34:48AM +1000, Nathan Scott wrote: > > Hi Roger, > > > > I'm gonna be rude and fwd your mail to the list - in the hope > > someone there will be able to help you. I'm running out of time > > @sgi and have a bunch of stuff still to get done before I skip > > outta here - having to look at the xfs_rename locking right now > > might just be enough to make my head explode. ;) > > > > cheers. > > > > ----- Forwarded message from Roger Willcocks ----- > > > > Date: 05 Sep 2006 14:30:30 +0100 > > To: nathans@sgi.com > > X-Mailer: Ximian Evolution 1.2.2 (1.2.2-4) > > From: Roger Willcocks > > Subject: race in xfs_rename? > > > > Hi Nathan, > > > > I think I must be missing something here: > > > > xfs_rename calls xfs_lock_for_rename, which i-locks the source file and > > directory, target directory, and (if it already exists) the target file. > > > > It returns a two-to-four entry list of participating inodes. > > > > xfs_rename unlocks them all, creates a transaction, and then locks them > > all again. > > > > Surely while they're unlocked, another processor could jump in and > > fiddle with the underlying files and directories? > > I don't think that can happen due to i_mutex locking at the vfs layer > i.e. in do_rename() via lock_rename() and in vfs_rename_{dir,other}(). > Hence I think it is safe for XFS to do what it does. > > FWIW, in Irix where there is no higher layer locking, XFS has extra > checks and locks (ancestor lock, inode generation count checks, etc) > to ensure nothing changed when the locks were dropped and regained. > AFAICT, the Linux XFS code doesn't need to do of this because the VFS > guarantees us that things won't change..... > > Cheers, > > Dave. Hi Dave & Nathan, yes that makes sense. I'm currently chasing a couple of xfs shutdowns on customer clusters, and 'rename' seems to be a factor, although it could just as well be a dodgy network driver, or whatever. I'll let you know if I find a reproducible test case. I've also been looking into a couple of 'LEAFN node level is X' warnings from xfs_repair, and it seems to me that leaf nodes don't actually have a /level/ member, although internal nodes do (compare xfs_dir2_leaf_hdr_t and xfs_da_intnode_t). The value being tested by xfs_repair is actually leaf->hdr.stale, so the warning is bogus. Or so it seems to me... -- Roger From owner-xfs@oss.sgi.com Fri Sep 8 11:30:42 2006 Received: with ECARTIS (v1.0.0; list xfs); Fri, 08 Sep 2006 11:30:54 -0700 (PDT) Received: from amsfep11-int.chello.nl (amsfep17-int.chello.nl [213.46.243.15]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id k88IUfDW003607 for ; Fri, 8 Sep 2006 11:30:42 -0700 Received: from cable-213-132-154-40.upc.chello.be ([213.132.154.40]) by amsfep11-int.chello.nl (InterMail vM.6.01.05.04 201-2131-123-105-20051025) with ESMTP id <20060908172214.XVNB14551.amsfep11-int.chello.nl@cable-213-132-154-40.upc.chello.be> for ; Fri, 8 Sep 2006 19:22:14 +0200 From: Grozdan Nikolov To: xfs@oss.sgi.com Subject: XFS questions Date: Fri, 8 Sep 2006 19:23:07 +0200 User-Agent: KMail/1.9.4 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200609081923.08215.microchip@chello.be> X-archive-position: 8927 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: microchip@chello.be Precedence: bulk X-list: xfs Content-Length: 837 Lines: 21 Hi, I have a few questions regarding the data integrity on XFS filesystems. I have 4 servers here all running on XFS partitions and I'm a bit concerned about the data integrity of an XFS filesystem. After reading a lot of benchmarks/user experiences I came to the conclusion that XFS is really very fast, as I experience it here on my servers too, but when it comes to data integrity it is wise not to use XFS for partitions containing important files as XFS may not be able to recover them after a lets say power outage. I'm also worried about the 'zeroing' thing in XFS. I have 3 questions... 1) How reliable is XFS at data-integrity? 2) Will the 'zeroing' thing be removed/fixed in the near future? 3) Will XFS ever support ordered or journalled mode like ReiserFS or Ext3? Thanks in advance and best regards, Grozdan From owner-xfs@oss.sgi.com Fri Sep 8 12:20:24 2006 Received: with ECARTIS (v1.0.0; list xfs); Fri, 08 Sep 2006 12:20:37 -0700 (PDT) Received: from smtp105.sbc.mail.mud.yahoo.com (smtp105.sbc.mail.mud.yahoo.com [68.142.198.204]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id k88JKKDW015784 for ; Fri, 8 Sep 2006 12:20:24 -0700 Received: (qmail 75026 invoked from network); 8 Sep 2006 19:19:42 -0000 Received: from unknown (HELO stupidest.org) (cwedgwood@sbcglobal.net@71.202.63.228 with login) by smtp105.sbc.mail.mud.yahoo.com with SMTP; 8 Sep 2006 19:19:41 -0000 Received: by tuatara.stupidest.org (Postfix, from userid 10000) id 50807180B3F6; Fri, 8 Sep 2006 12:19:40 -0700 (PDT) Date: Fri, 8 Sep 2006 12:19:40 -0700 From: Chris Wedgwood To: Grozdan Nikolov Cc: xfs@oss.sgi.com Subject: Re: XFS questions Message-ID: <20060908191940.GC30358@tuatara.stupidest.org> References: <200609081923.08215.microchip@chello.be> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200609081923.08215.microchip@chello.be> X-archive-position: 8928 X-ecartis-version: Ecartis v1.0.0 Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com X-original-sender: cw@f00f.org Precedence: bulk X-list: xfs Content-Length: 1725 Lines: 49 On Fri, Sep 08, 2006 at 07:23:07PM +0200, Grozdan Nikolov wrote: > I'm also worried about the 'zeroing' thing in XFS. Most of what people claim is a bit vague and often incorrect. > 1) How reliable is XFS at data-integrity? Fine, if your applications are sane. MTAs like postfix for example *never* had any problems with XFS. > 2) Will the 'zeroing' thing be removed/fixed in the near future? What usually happens if that if you truncate over a file, and write data *then* loose power some of the data might not have been written to disk yet so when you read it back the XFS returns zeroes. This is normal/expected for jo