http://oss.sgi.com/bugzilla/show_bug.cgi?id=339
Summary: data loss problem
Product: Linux XFS
Version: unspecified
Platform: IA32
OS/Version: Linux
Status: NEW
Severity: major
Priority: Medium
Component: XFS kernel code
AssignedTo: xfs-master@xxxxxxxxxxx
ReportedBy: tsuda@xxxxxxxxxxxxxx
Hello Russell and XFS people,
I reproduced a data loss problem which was caused by
asynchronous updating inode i_size and xfs_inode di_size.
My environment is as follows.
OS : RedHat9 + 2.4.26 + patch for XFS bugzilla #198
CPU : Pentium4 (2.53GHz)
MEM : 256MB
I made two simple test programs to reproduce the problem easily.
Test program 1 (test-enospc-write.c)
This program writes data to a file until ENOSPC and reads data
from it to verify written data, repeats these processes infinitely.
If the program detects bad data, it will put following message
and stop.
*** error : bad data image=... offset=...
Test program 2 (test-enospc-chmod.c)
This program repeats chmod for the file infinitely.
The problem was able to be reproduced in 5 minutes in my environment
by running these two programs simultaneously.
Procedure is as follows.
# gcc -o test-enospc-write test-enospc-write.c
?@# gcc -o test-enospc-chmod test-enospc-chmod.c
# mkfs -t xfs -f -d size=512m /dev/hda9
# mount -t xfs /dev/hda9 /mnt/xfs
# ./test-enospc-write /mnt/xfs/file1 &
# ./test-enospc-chmod /mnt/xfs/file1 &
I have investigated the problem using the kernel embeded local trace.
The problem seems to be caused in the following scenario
(in order of time).
1. TP1 sleeps in the middle of processing a write request
After the TP1 processed mKB in a write request, it calls
balance_dirty() to ease memory overload and sleeps
to flush dirty buffers (wait resources).
At this time, xfs_inode di_size is smaller than inode i_size,
because a_op->write_commit updates only inode i_size.
file image
offset=0 lKB
+--+...+--+--+
| | | |
+--+...+--+--+
+-----+
mKB
inode i_size : -------------> (l+mKB)
xfs_inode di_size : -------> (lKB)
2. TP2 revalidate inode i_size
The TP2 calls vn_revalidate() in linvfs_setattr().
At this time, inode i_size is changed to same value as
xfs_inode di_size.
As result inode i_size is lKB !
file image
offset=0 lKB
+--+...+--+--+
| | | |
+--+...+--+--+
+-----+
mKB
inode i_size : -------> (lKB)
xfs_inode di_size : -------> (lKB)
3. TP1 flushs dirty and delayed buffers
The TP1 wakes up and continues processing a write request.
The TP1 detects ENOSPC in xfs_iomap_write_delay() and
it calls xfs_flush_space() to get free space
by flushing dirty and delayed buffers.
But flushing buffers processed in current write request fails
and these buffers are discarded in xfs_page_state_convert(),
because the position of these buffers is over inode i_size.
4. TP1 updates both inode i_size and xfs_inode di_size
The TP1 updates both inode i_size and xfs_inode di_size
l+nKB (nKB >= mKB) at the last of processing a write request.
But several write data are losed.
file image
offset=0 lKB
+--+...+--+--+--+
| | | | | |
+--+...+--+--+--+
+--------+
nKB
+-----+
data loss
inode i_size : -------------------> (l+nKB)
xfs_inode di_size : -------------------> (l+nKB)
I made patch for 2.4.26, which simultaneously updates
both inode i_size and xfs_inode di_size at a_op->write_commit.
I am running the test on the kernel added this patch,
it has been running for over 6 hours with no data loss.
---
Masanori TSUDA
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
|