xfs
[Top] [All Lists]

Re: xfssyncd and disk spin down

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: xfssyncd and disk spin down
From: Petre Rodan <petre.rodan@xxxxxxxxxx>
Date: Thu, 20 Jan 2011 12:01:43 +0200
Cc: xfs@xxxxxxxxxxx
In-reply-to: <20101231001323.GD15179@dastard>
References: <20101223165532.GA23813@xxxxxxxxxxxxxxxx> <20101227021904.GA24828@dastard> <20101227061629.GA2275@xxxxxxxxxxxxxxxxxx> <20101227140750.GB24828@dastard> <20101227171939.GA7759@xxxxxxxxxxxxxxxxxx> <20101231001323.GD15179@dastard>
User-agent: Mutt/1.5.21 (2010-09-15)
hello Dave,

On Fri, Dec 31, 2010 at 11:13:23AM +1100, Dave Chinner wrote:
> On Mon, Dec 27, 2010 at 07:19:39PM +0200, Petre Rodan wrote:
> > 
> > Hello Dave,
> > 
> > On Tue, Dec 28, 2010 at 01:07:50AM +1100, Dave Chinner wrote:
> > > Turn on the XFS tracing so we can see what is being written every
> > > 36s.  When the problem shows up:
> > > 
> > > # echo 1 > /sys/kernel/debug/tracing/events/xfs/enable
> > > # sleep 100
> > > # cat /sys/kernel/debug/tracing/trace > trace.out
> > > # echo 0 > /sys/kernel/debug/tracing/events/xfs/enable
> > > 
> > > And post the trace.out file for us to look at.
> > 
> > attached.
> > 
> > you can disregard all the lvm partitions ('dev 254:.*') since they are on a 
> > different drive, probably only 8:17 is of interest.
> 
> Ok, I can see the problem. The original patch I tested:
> 
> http://oss.sgi.com/archives/xfs/2010-08/msg00026.html
> 
> Made the log covering dummy transaction a synchronous transaction so
> that the log was written and the superblock unpinned immediately to
> allow the xfsbufd to write back the superblock and empty the AIL
> before the next log covering check.
> 
> On review, the log covering dummy transaction got changed to an
> async transaction, so the superblock buffer is not unpinned
> immediately. This was the patch committed:
> 
> http://oss.sgi.com/archives/xfs/2010-08/msg00197.html
> 
> As a result, the success of log covering and idling is then
> dependent on whether the log gets written to disk to unpin the
> superblock buffer before the next xfssyncd run. It seems that there
> is a large chance that this log write does not happen, so the
> filesystem never idles correctly. I've reproduced it here, and only
> in one test out of ten did the filesystem enter an idle state
> correctly. I guess I was unlucky enough to hit that 1-in-10 case
> when I tested the modified patch.
> 
> I'll cook up a patch to make the log covering behave like the
> original patch I sent...

I presume that the new fix should be provided by "xfs: ensure log covering 
transactions are synchronous", so I tested 2.6.37 patched with it and then 
2.6.38_rc1 that has it included.

instead of having xfssyncd write to the drive every 36s, we now have this:

  8,32   0      162   828.393976164  1413  A  WS 977599761 + 2 <- (8,33) 
977597713
  8,32   0      163   828.393977820  1413  Q  WS 977599761 + 2 [xfssyncd/sdc1]
  8,32   0      167   846.401286530  1407  A   W 2048 + 1 <- (8,33) 0
  8,32   0      168   846.401288007  1407  Q   W 2048 + 1 [xfsbufd/sdc1]
  8,32   0      174   864.412155132  1413  A  WS 977599763 + 2 <- (8,33) 
977597715
  8,32   0      175   864.412156637  1413  Q  WS 977599763 + 2 [xfssyncd/sdc1]
  8,32   1      240   882.467053566  1407  A   W 2048 + 1 <- (8,33) 0
  8,32   1      241   882.467055152  1407  Q   W 2048 + 1 [xfsbufd/sdc1]
  8,32   0      179   900.426084634  1413  A  WS 977599765 + 2 <- (8,33) 
977597717
  8,32   0      180   900.426086076  1413  Q  WS 977599765 + 2 [xfssyncd/sdc1]
  8,32   0      185   918.457066312  1407  A   W 2048 + 1 <- (8,33) 0
  8,32   0      186   918.457068003  1407  Q   W 2048 + 1 [xfsbufd/sdc1]
  8,32   1      252   936.448120095  1413  A  WS 977599767 + 2 <- (8,33) 
977597719
  8,32   1      253   936.448121740  1413  Q  WS 977599767 + 2 [xfssyncd/sdc1]
  8,32   0      192   954.447207652  1407  A   W 2048 + 1 <- (8,33) 0
  8,32   0      193   954.447209114  1407  Q   W 2048 + 1 [xfsbufd/sdc1]
  8,32   0      199   972.458423873  1413  A  WS 977599769 + 2 <- (8,33) 
977597721
  8,32   0      200   972.458425189  1413  Q  WS 977599769 + 2 [xfssyncd/sdc1]
  8,32   1      266   990.457562449  1407  A   W 2048 + 1 <- (8,33) 0
  8,32   1      267   990.457563909  1407  Q   W 2048 + 1 [xfsbufd/sdc1]
  8,32   0      204  1008.476781955  1413  A  WS 977599771 + 2 <- (8,33) 
977597723
  8,32   0      205  1008.476783410  1413  Q  WS 977599771 + 2 [xfssyncd/sdc1]
  8,32   1      278  1026.531988963  1407  A   W 2048 + 1 <- (8,33) 0
  8,32   1      279  1026.531990718  1407  Q   W 2048 + 1 [xfsbufd/sdc1]
  8,32   0      209  1044.491342417  1413  A  WS 977599773 + 2 <- (8,33) 
977597725
  8,32   0      210  1044.491344042  1413  Q  WS 977599773 + 2 [xfssyncd/sdc1]
  8,32   0      214  1062.502603664  1407  A   W 2048 + 1 <- (8,33) 0
  8,32   0      215  1062.502605117  1407  Q   W 2048 + 1 [xfsbufd/sdc1]
  8,32   0      216  1062.502611005  1407  G   W 2048 + 1 [xfsbufd/sdc1]

in other words xfsyncd and xfsbufd now alternate at 18s intervals keeping the 
drive busy with nothing constructive hours after the last write to the drive.

to add to the misfortune, 'mount -o remount ' is no longer able to bring the 
drive to a quiet state since 2.6.37, so now the only way to achieve an idle 
drive is to fully umount and then remount the partition.

just for the record, this is a different drive then at the beginning of the 
thread, and it has these parameters:

meta-data=/dev/sdc1              isize=256    agcount=4, agsize=61047552 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=244190208, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=119233, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime =none                   extsz=4096   blocks=0, rtextents=0

attached you'll find the trace (with accesses to other drives filtered out).

cheers,
peter

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx

-- 
petre rodan
<petre.rodan@xxxxxxxxxx>
Technical Manager
Simplex SRL, Bucharest

Attachment: trace
Description: Text document

Attachment: pgpdmcGKSpSmo.pgp
Description: PGP signature

<Prev in Thread] Current Thread [Next in Thread>