hello Dave,
On Fri, Dec 31, 2010 at 11:13:23AM +1100, Dave Chinner wrote:
> On Mon, Dec 27, 2010 at 07:19:39PM +0200, Petre Rodan wrote:
> >
> > Hello Dave,
> >
> > On Tue, Dec 28, 2010 at 01:07:50AM +1100, Dave Chinner wrote:
> > > Turn on the XFS tracing so we can see what is being written every
> > > 36s. When the problem shows up:
> > >
> > > # echo 1 > /sys/kernel/debug/tracing/events/xfs/enable
> > > # sleep 100
> > > # cat /sys/kernel/debug/tracing/trace > trace.out
> > > # echo 0 > /sys/kernel/debug/tracing/events/xfs/enable
> > >
> > > And post the trace.out file for us to look at.
> >
> > attached.
> >
> > you can disregard all the lvm partitions ('dev 254:.*') since they are on a
> > different drive, probably only 8:17 is of interest.
>
> Ok, I can see the problem. The original patch I tested:
>
> http://oss.sgi.com/archives/xfs/2010-08/msg00026.html
>
> Made the log covering dummy transaction a synchronous transaction so
> that the log was written and the superblock unpinned immediately to
> allow the xfsbufd to write back the superblock and empty the AIL
> before the next log covering check.
>
> On review, the log covering dummy transaction got changed to an
> async transaction, so the superblock buffer is not unpinned
> immediately. This was the patch committed:
>
> http://oss.sgi.com/archives/xfs/2010-08/msg00197.html
>
> As a result, the success of log covering and idling is then
> dependent on whether the log gets written to disk to unpin the
> superblock buffer before the next xfssyncd run. It seems that there
> is a large chance that this log write does not happen, so the
> filesystem never idles correctly. I've reproduced it here, and only
> in one test out of ten did the filesystem enter an idle state
> correctly. I guess I was unlucky enough to hit that 1-in-10 case
> when I tested the modified patch.
>
> I'll cook up a patch to make the log covering behave like the
> original patch I sent...
I presume that the new fix should be provided by "xfs: ensure log covering
transactions are synchronous", so I tested 2.6.37 patched with it and then
2.6.38_rc1 that has it included.
instead of having xfssyncd write to the drive every 36s, we now have this:
8,32 0 162 828.393976164 1413 A WS 977599761 + 2 <- (8,33)
977597713
8,32 0 163 828.393977820 1413 Q WS 977599761 + 2 [xfssyncd/sdc1]
8,32 0 167 846.401286530 1407 A W 2048 + 1 <- (8,33) 0
8,32 0 168 846.401288007 1407 Q W 2048 + 1 [xfsbufd/sdc1]
8,32 0 174 864.412155132 1413 A WS 977599763 + 2 <- (8,33)
977597715
8,32 0 175 864.412156637 1413 Q WS 977599763 + 2 [xfssyncd/sdc1]
8,32 1 240 882.467053566 1407 A W 2048 + 1 <- (8,33) 0
8,32 1 241 882.467055152 1407 Q W 2048 + 1 [xfsbufd/sdc1]
8,32 0 179 900.426084634 1413 A WS 977599765 + 2 <- (8,33)
977597717
8,32 0 180 900.426086076 1413 Q WS 977599765 + 2 [xfssyncd/sdc1]
8,32 0 185 918.457066312 1407 A W 2048 + 1 <- (8,33) 0
8,32 0 186 918.457068003 1407 Q W 2048 + 1 [xfsbufd/sdc1]
8,32 1 252 936.448120095 1413 A WS 977599767 + 2 <- (8,33)
977597719
8,32 1 253 936.448121740 1413 Q WS 977599767 + 2 [xfssyncd/sdc1]
8,32 0 192 954.447207652 1407 A W 2048 + 1 <- (8,33) 0
8,32 0 193 954.447209114 1407 Q W 2048 + 1 [xfsbufd/sdc1]
8,32 0 199 972.458423873 1413 A WS 977599769 + 2 <- (8,33)
977597721
8,32 0 200 972.458425189 1413 Q WS 977599769 + 2 [xfssyncd/sdc1]
8,32 1 266 990.457562449 1407 A W 2048 + 1 <- (8,33) 0
8,32 1 267 990.457563909 1407 Q W 2048 + 1 [xfsbufd/sdc1]
8,32 0 204 1008.476781955 1413 A WS 977599771 + 2 <- (8,33)
977597723
8,32 0 205 1008.476783410 1413 Q WS 977599771 + 2 [xfssyncd/sdc1]
8,32 1 278 1026.531988963 1407 A W 2048 + 1 <- (8,33) 0
8,32 1 279 1026.531990718 1407 Q W 2048 + 1 [xfsbufd/sdc1]
8,32 0 209 1044.491342417 1413 A WS 977599773 + 2 <- (8,33)
977597725
8,32 0 210 1044.491344042 1413 Q WS 977599773 + 2 [xfssyncd/sdc1]
8,32 0 214 1062.502603664 1407 A W 2048 + 1 <- (8,33) 0
8,32 0 215 1062.502605117 1407 Q W 2048 + 1 [xfsbufd/sdc1]
8,32 0 216 1062.502611005 1407 G W 2048 + 1 [xfsbufd/sdc1]
in other words xfsyncd and xfsbufd now alternate at 18s intervals keeping the
drive busy with nothing constructive hours after the last write to the drive.
to add to the misfortune, 'mount -o remount ' is no longer able to bring the
drive to a quiet state since 2.6.37, so now the only way to achieve an idle
drive is to fully umount and then remount the partition.
just for the record, this is a different drive then at the beginning of the
thread, and it has these parameters:
meta-data=/dev/sdc1 isize=256 agcount=4, agsize=61047552 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=244190208, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=119233, version=2
= sectsz=512 sunit=0 blks, lazy-count=0
realtime =none extsz=4096 blocks=0, rtextents=0
attached you'll find the trace (with accesses to other drives filtered out).
cheers,
peter
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
--
petre rodan
<petre.rodan@xxxxxxxxxx>
Technical Manager
Simplex SRL, Bucharest
trace
Description: Text document
pgpdmcGKSpSmo.pgp
Description: PGP signature
|