| To: | Dave Chinner <david@xxxxxxxxxxxxx> |
|---|---|
| Subject: | Re: xfs: invalid requests to request_fn from xfs_repair |
| From: | Jamie Pocas <pocas.jamie@xxxxxxxxx> |
| Date: | Tue, 1 Apr 2014 18:28:19 -0400 |
| Cc: | xfs@xxxxxxxxxxx |
| Delivered-to: | xfs@xxxxxxxxxxx |
| Dkim-signature: | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=g0rUuf3puMDLkN7m+tTWglebMn4mZ23jPMURhYvZbgM=; b=aLVzu/l1IpZdgRtQc+Kzlq3YRLwJN1tSio6WaSVk2uJr8ZqF88MMPQH78XsFRqCvEh egyFrXu3P5hD0EsqAPScmyp+CjgsdkuQwIsPP783WziIiX+akCHhFSgPGsXhKm21iYt1 FpNbzmYGveFATU8yQf0LMvg8lhwZRlgB6yyHMwqX2cznZW+fORoVeGKkiTjtbdb7L/RG wsGgULD4WsmldPN4Oj+d7tKd4iar/fFL3WhP+UAYnQCxMaVlhZgUCdBJ4nvIELHPPyy1 7/tfDLigBDmAdC8SaRkdHikUqceV8WHiCE44REwprV8KtKegvzMOxrGMDUAZf5XYVJe2 XKpg== |
| In-reply-to: | <20140401204215.GG17603@dastard> |
| References: | <CABHf-sxmxmM0+WVzvGGJqKrrGngm0qrGTsYDnEmUEf+GJ_pK8A@xxxxxxxxxxxxxx> <20140401204215.GG17603@dastard> |
|
I have to say, thanks very much for the lightning fast response. Comments are inline. I think the punchline is going to end up being that I probably need to learn some more about the effects of O_DIRECT on the I/O path and how it ends up in the request_queue. It's tempting now to regress back to a bio-handling function, do my own merging, and see if I can avoid the problem, but I am stubborn so I will probably stick it out for as long as I can.
On Tue, Apr 1, 2014 at 4:42 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
I guess my assumption that sectors would be adjacent was invalid, but this is what I have read in ldd3 and other books and seen in some driver examples. So I apologize for my folly and it's another lesson in not believing everything you see on the net. The back half of my driver is really optimized for transferring a single contiguous lba & transfer length pair (ala SCSI WRITE 16 for example). However in desperation, I had changed the driver to support requests like this (with disjoint sector ranges) but it still seemed odd to me that occasionally some requests would even have multiple segments that would *write* the same sectors/lengths. For example I also see requests like the following.
Given: rq_data_dir(rq) == 1 (i.e. a write) segment 0: iter.bio->bi_sector = 0, blk_rq_cur_sectors(rq) = 7 // Ok segment 1: iter.bio->bi_sector = 0, blk_rq_cur_sectors(rq) = 7 // Again?
segment 2: iter.bio->bi_sector = 0, blk_rq_cur_sectors(rq) = 7 // Yet again? segment 3: iter.bio->bi_sector = 0, blk_rq_cur_sectors(rq) = 7 // Really? ... segment 126: iter.bio->bi_sector = 0, blk_rq_cur_sectors(rq) = 7 // What's going on?
// Reminder: segment limit is 128 segment 127: iter.bio->bi_sector = 1023, blk_rq_cur_sectors(rq) = 1 // Haha, bet you though I was going to say sector = 0! This makes sense for reads, since maybe multiple user buffers wanted to read the same sectors at [around] the same time so it batched them in one request (temporal + spatial locality), but it doesn't make a lot of sense to me for writes. That seems inefficient at best. Wouldn't the last write win?
Even still, I occasionally see requests with more segments than the max (perhaps I am counting segments wrong too). This was commented on by Jens Axboe years ago here, but still seems to happen on my Ubuntu 12.04 w/ 3.2 kernel occasionally: https://lkml.org/lkml/2009/2/4/465
In general if I were reading a file I agree 100%, but this was the read call issued by xfs_repair to read the superblock directly from a block device. So I, perhaps wrongly, anticipated straightforward requests or maybe even a single 512k request coming down to the request_fn. I should have mentioned that this was observed while using xfs_repair earlier in the email though, so no offense intended. There's more of the xfs_repair strace in my response to your next comment below. It fails very early as you can imagine when xfs_repair can't read the superblock.
Yes, sorry about that I was trying to not bloat the thread with all the localizations loading noise. I'm going to try to strip some of those out and post more of the strace below. I *do* see the O_DIRECT flag as expected. Here's the block device being opened. All I did was "xfs_repair -n /dev/tsca0", hence the O_RDONLY flag.
open("/dev/tsca0", O_RDONLY|O_DIRECT) = 4 fstat(4, {st_mode=S_IFBLK|0660, st_rdev=makedev(251, 0), ...}) = 0 fstat(4, {st_mode=S_IFBLK|0660, st_rdev=makedev(251, 0), ...}) = 0
ioctl(4, BLKGETSIZE64, 0x7fff8f759270) = 0 ioctl(4, BLKSSZGET, 0x6732e0) = 0 chdir("/home/jpocas/source/xfs/drivers") = 0 brk(0x23f9000) = 0x23f9000
close(3) = 0 getrlimit(RLIMIT_FSIZE, {rlim_cur=RLIM_INFINITY, rlim_max=RLIM_INFINITY}) = 0 open("/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=2570, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f00e2fc2000 read(3, "# Locale name alias data base.\n#"..., 4096) = 2570
read(3, "", 4096) = 0 close(3) = 0 munmap(0x7f00e2fc2000, 4096) = 0 write(2, "Phase 1 - find and verify superb"..., 40Phase 1 - find and verify superblock... ) = 40 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f00e2f42000 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f00e2ec1000
lseek(4, 0, SEEK_SET) = 0 read(4, 0x7f00e2ec1200, 524288) = -1 EIO (Input/output error) write(2, "superblock read failed, offset 0"..., 61superblock read failed, offset 0, size 524288, ag 0, rval -1
) = 61 write(2, "\nfatal error -- ", 16 fatal error -- ) = 16 write(2, "Input/output error\n", 19Input/output error ) = 19 exit_group(1) = ?
Got it. I need to look at the DIO code path in the kernel. > *Point 2:* If I ignore the incorrect iter.bio->bi_sector, and just Sorry, this was just out of desperation and another data point, ridiculous as it may sound. I have been looking at this for a couple weeks now so I am taking wild shots in the dark.
That makes sense and I do see that O_DIRECT is being used by both mkfs.xfs and xfs_repair. I guess I am off to investigate the O_DIRECT code path. |
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Previous by Date: | Re: [PATCH] xfs: fix bad hash ordering, Dave Chinner |
|---|---|
| Next by Date: | Re: [Lsf] [PATCH] xfstests-bld: Simplify determination of number of CPUs in build-all, Dave Chinner |
| Previous by Thread: | Re: xfs: invalid requests to request_fn from xfs_repair, Dave Chinner |
| Next by Thread: | Re: xfs: invalid requests to request_fn from xfs_repair, Dave Chinner |
| Indexes: | [Date] [Thread] [Top] [All Lists] |