On 09 Mar 2001 11:45:51 -0500, Vladimir Vukicevic wrote:
>
>
> Hmm. I now have a repeatable case of this I/O error, but as the other
> end is running the kernel nfsd server, I'm not exactly sure how to debug
> it or turn on debugging on the other end... A 'cat foo.ogg /dev/null' on
> the mounted partition repeatably gives an I/O error.
>
> Any thoughts on how to diagnose this? I'll keep poking..
Doh. Forgot about tcpdump. :-) So, this is what I'm seeing:
11:57:46.159530 rain.ximian.priv.4040915765 > ogg.nfs: 140 lookup fh
Unknown/1 "01-letters_from_the_wasteland.ogg" (DF)
11:57:46.163988 ogg.nfs > rain.ximian.priv.4040915765: reply ok 128
lookup fh Unknown/1 (DF)
So, the lookup goes okay. Then the weirdness starts.
11:57:46.173585 rain.ximian.priv.4057692981 > ogg.nfs: 112 read fh
Unknown/1 4096 bytes @ 2408448 (DF)
11:57:46.217986 ogg > rain.ximian.priv: (frag 20288:1480@1480+)
11:57:46.220115 ogg.nfs > rain.ximian.priv.4057692981: reply ok 1472
read (frag 20288:1480@0+)
Looking at this more in ethereal, this call/reply sequence has XID
0xf0db7b35. It appears to succeed (although I'm confused why it's
reading @ offset 2408448). However, the next 3 calls have the exact same
XID (marked as dup's in ethereal), and it's reading same size/offset.
11:57:46.865276 rain.ximian.priv.4057692981 > ogg.nfs: 112 read fh
Unknown/1 4096 bytes @ 2408448 (DF)
11:57:46.907584 ogg > rain.ximian.priv: (frag 20544:1480@1480+)
11:57:46.909674 ogg.nfs > rain.ximian.priv.4057692981: reply ok 1472
read (frag 20544:1480@0+)
11:57:48.265276 rain.ximian.priv.4057692981 > ogg.nfs: 112 read fh
Unknown/1 4096 bytes @ 2408448 (DF)
11:57:48.319183 ogg > rain.ximian.priv: (frag 20800:1480@1480+)
11:57:48.321456 ogg.nfs > rain.ximian.priv.4057692981: reply ok 1472
read (frag 20800:1480@0+)
11:57:51.065342 rain.ximian.priv.4057692981 > ogg.nfs: 112 read fh
Unknown/1 4096 bytes @ 2408448 (DF)
11:57:51.114002 ogg > rain.ximian.priv: (frag 21056:1480@1480+)
11:57:51.115317 ogg.nfs > rain.ximian.priv.4057692981: reply ok 1472
read (frag 21056:1480@0+)
Then, it switches to a new XID, and the same set repeats itself.
11:57:56.666020 rain.ximian.priv.4074470197 > ogg.nfs: 112 read fh
Unknown/1 4096 bytes @ 2408448 (DF)
11:57:56.720428 ogg > rain.ximian.priv: (frag 21312:1480@1480+)
11:57:56.722696 ogg.nfs > rain.ximian.priv.4074470197: reply ok 1472
read (frag 21312:1480@0+)
11:57:57.365349 rain.ximian.priv.4074470197 > ogg.nfs: 112 read fh
Unknown/1 4096 bytes @ 2408448 (DF)
11:57:57.411418 ogg > rain.ximian.priv: (frag 21568:1480@1480+)
11:57:57.413422 ogg.nfs > rain.ximian.priv.4074470197: reply ok 1472
read (frag 21568:1480@0+)
11:57:58.765279 rain.ximian.priv.4074470197 > ogg.nfs: 112 read fh
Unknown/1 4096 bytes @ 2408448 (DF)
11:57:58.814856 ogg > rain.ximian.priv: (frag 21824:1480@1480+)
11:57:58.817140 ogg.nfs > rain.ximian.priv.4074470197: reply ok 1472
read (frag 21824:1480@0+)
11:58:01.565281 rain.ximian.priv.4074470197 > ogg.nfs: 112 read fh
Unknown/1 4096 bytes @ 2408448 (DF)
11:58:01.617302 ogg > rain.ximian.priv: (frag 22080:1480@1480+)
11:58:01.619220 ogg.nfs > rain.ximian.priv.4074470197: reply ok 1472
read (frag 22080:1480@0+)
... and then cp dies with an I/O error.
So, from looking at this, I'm going to blame the client side NFS stuff
here -- especially since the file is perfectly fine on the server
itself. Alan Cox said that he wasn't aware of any nfs client-side
patches that have gone in since 2.4.2 came out.
Note that this is actually a similar error to what I was seeing while
running my iPAQ with a NFS'd root filesystem -- certain files just give
I/O errors. This is one of them; I recreated by copying the file to an
ext2 filesystem and getting the same I/O error.
So, this isn't XFS related (whew!), but nfs on linux does indeed suck.
:-P
- Vlad
|