Howdy, again!
As you may remember, Scott Smyth posted a plea for help here last week
(or early this week), regarding XFS on RAID5. If an attempt to mount
XFS on a RAID5 array while it's rebuilding is made, then the rebuild
stops, as evidenced by the numbers in /proc/mdstat never changing. He
asked me to work on this too, and I have some more insight to share.
The function md_do_sync in md.c is the scheduling exec for raid5syncd.
Around line 3353, a for loop is started. It calls raid5_sync_request
(in raid5.c) who updates some number of blocks, and returns how many it
updated in the blocks variable. During the failure, I've verified that
blocks is 0! Thus, j (the loop index) is never increased, and it never
exits. It also never reaches the 'md_signal_pending' call, which is
consistent with my findings: kill(1) doesn't work on the runaway
raid5syncd.
Tracing back into raid5_sync_request, the last statement is:
return (bufsize>>10)-redone;
Now, bufsize comes from sh->size, which is the MD blocksize. I've
noticed several times now, that kernel messages from raid5 regarding the
bufsize changing to/from 512, 1024, and 4096 occur just before any
oopsen/panics/crashes. If bufsize is really 512, then 512 >> 10 is 0!
In fact, even with a 'clean' RAID5, in just copying a directory full of
"The Allman Brothers" MP3s to my XFS on RAID5, these blocksize change
messages occur several times, after which the kernel
oops/panics/reboots. That last part is inconsistent. Hopefully, this
isn't due to the content of the files ;-).
Now my main question: why does XFS change the block size? Also, when I
attempt the mkfs.xfs, md reports: "mkfs.xfs(pid 527) used obsolete MD
ioctl, upgrade your software to use new ictls." Is this an attempt by
mkfs.xfs to change the blocksize?
Thanks for your time!
--
"Men occasionally stumble over the truth, but most of them pick
themselves up and hurry off as if nothing had happened."
-- Winston Churchill
Danny
|