This looks really promising.
I'm currently reading through the code again to
see what kind of implications this might have.
I'm worried that you're patch might increase file fragmentation,
but that is just at first glance. I'll look some more and run
some testing with and with out you're patch.
I'm looking at xfs_inactive_free_eofblocks again, I think
there may be an issue with the xfs_inode di_size and the linux
inode i_size.
BTW what tracing did you use to find this?
On Wed, 2004-06-09 at 20:30, Masanori TSUDA wrote:
> Hi,
>
> I have reproduced similar problem on xfs1.3.1 (based on 2.4.21),
> my environment is as follows.
>
> nfs server :
> OS : RedHat9 + xfs1.3.1 (based on 2.4.21)ã
> CPU : Xeon(2.4GHz) x 2
> MEM : 1GB
> NIC : Intel PRO/1000
> Local Filesystem : XFS, the refcache is disabled.
>
> nfs client :
> OS : RadHat9 (based on 2.4.20-8)
> NIC : Intel PRO/1000
> NFS Ver. : 3
> NFS Mount Options : udp,hard,intr,wsize=8192
>
> Within 1 hour of running the test, the corruption was detected.
> (to make it easy to detect the corruption, umount nfs, umount xfs,
> mount xfs and mount nfs before comparing data, i.e. purge memory cache.)
> The corruption width was a multiple of 4KB, starting at 4KB boundary.
> In many cases, it was caused in the start part of the physical extent.
>
> I have investigated the issue using the kernel embeded local trace.
> I think that the issue was caused by the delayed allocation mechanism.
> I explain the example of curruption scenario which I guess.
> Each process of the scenario is in order of time.
>
> 1. open and write in nfsd (for write1)
> The nfs client write 8KB data to file (called write1).
> The write request is processed in nfsd. The nfsd call open [linvfs_open],
> and call write [linvfs_write]. After calling write, the file has several
> delayed allocation blocks over end of the file, by allocation in chunks
> and
> alignment of writeiosize.
>
> file image
> offset=0 eof
> +----+----+----+----+----+- ... +----+
> | | | | | | | |
> +----+----+----+----+----+- ... +----+
> 4KB 4KB
> +---------+
> write data (write1)
> +------------------------------------+
> delayed allocation blocks
>
> 2. allocate disk space in kupdated (for write1)
> The disk space is allocated for delayed allocotion blocks before data
> flushed to disk [linvfs_writepage, page_state_convert].
>
> file image
> offset=0 eof
> +----+----+----+----+----+- ... +----+
> | | | | | | | |
> +----+----+----+----+----+- ... +----+
> 4KB 4KB
> +---------+
> write data (write1)
> +------------------------------------+
> allocated disk space
> +---------+
> called disk space1
> +--------------------------+
> called disk space2
>
> 3. close in nfsd (for write1)
> The nfsd call close [linvfs_release]. At this time, allocated disk space
> over end of the file (disk space2) is truncated, when the refcache is
> disabled
> [xfs_inactive_free_eofblocks].
>
> file image
> offset=0 eof
> +----+----+
> | | |
> +----+----+
> 4KB 4KB
> +---------+
> write data (write1)
> +---------+
> disk space1
>
> 4. open and write in nfsd (for write2)
> Furthermore the nfs client write 8KB data to file (called write2).
> The nfsd call open [linvfs_open], and call write [linvfs_write].
>
> file image
> offset=0 eof
> +----+----+----+----+----+- ... +----+
> | | | | | | | |
> +----+----+----+----+----+- ... +----+
> 4KB 4KB 4KB 4KB
> +---------+
> write data (write1)
> +---------+
> write data (write2)
> +--------------------------+
> delayed allocation blocks
> +---------+
> disk space1
>
> 5. flush data to disk in kupdated (for write1)
> The write data (write1) is flushed to disk space1 [page_state_convert].
> And the write data (write2) is flushed to disk space2 [cluster_write] !!!,
> because the buffer status of write data (write2) is dirty and delay.
> But, the disk space2 dose not exist at this time.
> The disk space2 may be used by the other file or free space.
>
> I think that one of solution for the issue is to flush only buffers in
> end of the file before allocating disk space for delayed allocation blocks,
> don't flush buffers over that.
> I made patch for xfs1.3.1. I am running the test on the kernel added the
> patch, it has been running for over 16 hours with no corruption.
>
> Could you please comment the attached patch.
>
> Regards,
> Tsuda
>
> In message "data corruption on nfs+xfs"
> (04/05/27 15:58:48),
> kazuyuki@xxxxxxxxxxxxxxxxxxx wrote...
> >We are experiencing the same problem as No.198.
> > http://oss.sgi.com/bugzilla/show_bug.cgi?id=198
> > http://marc.theaimsgroup.com/?t=108343605300001&r=1&w=2
> >
> >We have confirmed that even when the refcache is disabled, setting
> >fs.xfs.refcache_size to zero through sysctl, the problem does not disappear.
> >To run linux as single CPU mode, it makes the problem slightly hard to occur,
> >but it still occurs.
> >
> >Two types of corruption we've seen:
> >
> > 1) Width is a multiple of 8kB, starting at 8kB boundary.
> > *Maybe the same trouble as No.198.
> >
> > 2) Width is a 964 bytes, ending up to 4kB boundary.
> > *I'm not sure the cause is same as 1) above.
> >
> >We have tested on 2.4.20-20.9.XFS1.3.1, 2.4.20-30.9.sgi1 XFS1.3.3 and other
> >kernels
> >based on 2.4.20-20 on which we made some changes.
> >
> >Anyone who knows where is the cause. On page cache, disk block handling, or
> >other parts?
> >Or who knows how to avoid this with some setting or another version?
> >
signature.asc
Description: This is a digitally signed message part
|