Simon Matter schrieb:
>
> Adrian Head schrieb:
> >
> > Thanks for your reply Simon
> >
> > Yes the softraid was fully synced before I started any test.
> >
> > The XFS patch I used to obtain these errors was
> > patch-2.4.9-xfs-2001-08-19 and the errors were:
> > Sep 9 05:13:46 ATLAS kernel: 02:86: rw=0, want=156092516, limit=360
> > Sep 9 05:13:46 ATLAS kernel: attempt to access beyond end of device
> >
> > When I used a later version of the XFS patch I had more descriptive
> > errors written to /var/log/messages:
> > Sep 10 10:14:57 ATLAS kernel: I/O error in filesystem ("md(9,0)")
> > meta-data dev 0x900 block 0x9802bdc
> > Sep 10 10:14:57 ATLAS kernel: (xlog_iodone") error 5 buf count 32768
> > Sep 10 10:14:57 ATLAS kernel: xfs_force_shutdown(md(9,0),0x2) called
> > from line 940 of file xfs_log.c. Return address - 0xd8cb66f8
> > Sep 10 10:14:57 ATLAS kernel: Log I/O Error Detected. Shutting down
> > filesystem: md(9,0)
> > Sep 10 10:14:57 ATLAS kernel: Please umount the filesystem, and rectify
> > the problem(s)
> > Sep 10 10:14:57 ATLAS kernel: xfs_force_shutdown(md(9,0),0x2) called
> > from line 714 of file xfs_log.c. Return address = 0xd8cb65d3
> > Sep 10 10:14:57 ATLAS kernel: attempt to access beyond end of device
> > Sep 10 10:14:57 ATLAS kernel: 02:82: rw=0, want=1602235696, limit=4
> >
> > I did think at the time that it may have been issues with XFS stomping
> > all over raid code or raid code stomping all over XFS. Although I not
> > sure now as the 2.4.10-pre2-xfs-2001-09-02 patch never wrote any errors
> > out at all. (please see my 2nd post for more info)
> >
> > Thanks for taking the time to test this on your own machine.
>
> I tried 20, 40 and 80 simultanous cp with no crash. Then I changed the
> file tree and the new tree has ~280M small files with 100b-50kb size.
> When using 60 cp jobs the machine died. I could ping it but nothing
> more. No ssh, no console, no shutdown. I try some more tests tonight. I
> try the same with ext2 as well to make sure it's XFS and not Softraid.
Update: I tried the 60 cp jobs on a ext2 filesystem and the system is
still alive but 58 of the 60 cp jobs are hanging. Well maybe the 2.4.3
kernel is a bit old now... I'll try some more tests.
Simon
>
> -Simon
>
> >
> > Adrian Head
> > Bytecomm P/L
> >
> > > -----Original Message-----
> > > From: Simon Matter [SMTP:simon.matter@xxxxxxxxxxxxxxxx]
> > > Sent: Monday, 10 September 2001 17:45
> > > To: adrian.head@xxxxxxxxxxxxxxx
> > > Cc: linux-xfs@xxxxxxxxxxx
> > > Subject: Re: Problems with many processes copying large
> > > directories across an XFS volume.
> > >
> > > Hi Adrian
> > >
> > > I did similar tests two months ago. I was having problems as well but
> > > ufurtunately I don't remember what is was exactly.
> > > First question: You created Softraid5, was the raid synced when you
> > > started the tests?
> > >
> > > > In the /var/log/messages log around the same time as the copy test I
> > > get
> > > > entries like:
> > > > Sep 9 05:13:46 ATLAS kernel: 02:86: rw=0, want=156092516, limit=360
> > > > Sep 9 05:13:46 ATLAS kernel: attempt to access beyond end of device
> > >
> > > This looks interesting. I don't know what this means exactly but it
> > > looks to me like you managed to create a filesystem bigger than the
> > > raid
> > > volume was? I got the very same error when I tried to restore data
> > > with
> > > xfsrestore from DAT (xfsrestore from DLT was fine). The issue is still
> > > open.
> > >
> > > I have a test system here with SoftRAID5 on 4 U160 SCSI disks. I'll
> > > try
> > > to kill it today with cp jobs.
> > >
> > > -Simon
> > >
|