xfs: invalid requests to request_fn from xfs_repair

Jamie Pocas pocas.jamie at gmail.com
Tue Apr 1 15:16:39 CDT 2014


Hi folks,

I have a very simple block device driver that uses the request_fn style of
processing instead of the older bio handling or newer multiqueue approach.
I have been using this with ext3 and ext4 for years with no issues, but
scalability requirements have dictated that I move to xfs to better support
larger devices.

I'm observing something weird in my request_fn. It seems like the block
layer is issuing invalid requests to my request function, and it really
manifests when I use xfs_repair. Here's some info:

blk_queue_physical_block_size(q, 512) // should be no surprise
blk_queue_logical_block_size(q, 512) // should be no surprise
blk_queue_max_segments(q, 128); /* 128 memory segments (page +
offset/length pairs) per request! */
blk_queue_max_hw_sectors(q, CA_MAX_REQUEST_SECTORS); /* Up to 1024 sectors
(512k) per request hard limit in the kernel */
blk_queue_max_segment_size(q, CA_MAX_REQUEST_BYTES); /* 512k (1024 sectors)
is the hard limit in the kernel */

While iterating through segments in rq_for_each_segment(), for some
requests I am seeing some odd behavior.

segment 0: iter.bio->bi_sector = 0, blk_rq_cur_sectors(rq) = 903   // Ok,
this looks normal
segment 1: iter.bio->bi_sector = 1023, blk_rq_cur_sectors(rq) = 7 //
Whoah... this doesn't look right to me
...

You can see with segment 1, that the start sector is *NOT* adjacent to the
the previous segment's sectors (there's a gap from sector 903 through 1022)
and that the "sparse" request, for lack of a better term, extends beyond
the max I/O boundary of 512k. Furthermore, this doesn't seem to jibe with
what userspace is doing, which is a simple 512k read all in one chunk with
a single userspace address.

But when you look at the strace of what xfs_repair is doing, it's just an
innocuous read of 512k from sector 0.

write(2, "Phase 1 - find and verify superb"..., 40Phase 1 - find and verify
superblock...
) = 40
mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x7f00e2f42000
mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x7f00e2ec1000
lseek(4, 0, SEEK_SET)                   = 0
read(4, 0x7f00e2ec1200, 524288)         = -1 EIO (Input/output error)
write(2, "superblock read failed, offset 0"..., 61superblock read failed,
offset 0, size 524288, ag 0, rval -1
) = 61

The reason you see the EIO is because I am failing a request in the driver
since it violates the restrictions I set earlier, is non-adjacent, and so I
am unable to satisfy it.

*Point 1:* Shouldn't requests contain all segments that are adjacent on
disk e.g. if initially before the rq_for_each_segment() loop blk_rq_pos(rq)
is 10, and blk_rq_cur_sectors is 10, then on the next iteration (if any)
iter.bio->bi_sector should be 10+10-1=20? Is my understanding correct? Are
these some kind of special requests that should be handled differently
(e.g. I know that DISCARD requests have to be handled differently and
shouldn't be run through rq_for_each_segment, and that FLUSH requests are
often empty). The cmd_flags say that they are normal REQ_TYPE_FS requests.

*Point 2:* If I ignore the incorrect iter.bio->bi_sector, and just
read/write the request out as if it were adjacent, I xfs_repair reports
corruption, and sure enough there are inodes which are zeroed out instead
of having the inode magic 0x494e ( "IN") as expected. So mkfs.xfs, while
not sending what appear to be illegal requests, is still resulting in
corruption.

*Point 3:* Interestingly this goes away when I set
blk_queue_max_segments(q, 1), but this obviously cuts down on clustering,
and this of course kills performance. Is this indicative of anything in
particular that I could be doing wrong?

Please cut me some slack when I say something like xfs_repair is "sending"
invalid requests. I know that there is the C library, system call
interface, block layer, etc.. in between, but I just mean to say simply
that using this tool results in this unexpected behavior. I don't mean to
point blame at xfs or xfsprogs. If this turns out to be a block layer
issue, and this posting needs to be sent elsewhere, I apologize and would
appreciate being pointed in the right direction.

It almost feels like the block layer is splitting the bios up wrongly, is
corrupting the bvecs, or is introducing a race. What's strange again, is
that I have only seen this behavior with xfs tools, but not ext3, or ext4
and e2fsprogs which has been working for years. It really shouldn't matter
though, because mkfs.xfs and xfs_repair are user space tools, so this
shouldn't cause the block layer in the kernel to send down invalid
requests. I have been grappling with this for a few weeks, and I am tempted
to go to the old bio handling function instead just to see if that would
work out for me better, but that would be a big rewrite of the LLD. I am
using an older Ubuntu 12.04 kernel 3.2.x so I am not able to go to the
newer multiqueue implementation.


Any ideas/suggestions?
Need more information?


Thanks and Regards,
Jamie Pocas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20140401/272f34e5/attachment.html>


More information about the xfs mailing list