xfs
[Top] [All Lists]

Re: Problems with many processes copying large directories acrossan XFS

To: Adrian Head <adrian.head@xxxxxxxxxxxxxxx>, linux-xfs@xxxxxxxxxxx
Subject: Re: Problems with many processes copying large directories acrossan XFS volume.
From: Simon Matter <simon.matter@xxxxxxxxxxxxxxxx>
Date: Tue, 11 Sep 2001 07:43:47 +0200
>received: from mobile.sauter-bc.com (unknown [10.1.6.21]) by basel1.sauter-bc.com (Postfix) with ESMTP id EF65757306; Tue, 11 Sep 2001 07:43:47 +0200 (CEST)
Organization: Sauter AG, Basel
References: <D1F276F384A0D311A7C500A0C9E89B381980E1@herbie.local> <3B9CCE00.D704DC0B@ch.sauter-bc.com>
Sender: owner-linux-xfs@xxxxxxxxxxx
Simon Matter schrieb:
> 
> Adrian Head schrieb:
> >
> > Thanks for your reply Simon
> >
> > Yes the softraid was fully synced before I started any test.
> >
> > The XFS patch I used to obtain these errors was
> > patch-2.4.9-xfs-2001-08-19 and the errors were:
> > Sep 9 05:13:46 ATLAS kernel: 02:86: rw=0, want=156092516, limit=360
> > Sep 9 05:13:46 ATLAS kernel: attempt to access beyond end of device
> >
> > When I used a later version of the XFS patch I had more descriptive
> > errors written to /var/log/messages:
> > Sep 10 10:14:57 ATLAS kernel: I/O error in filesystem ("md(9,0)")
> > meta-data  dev 0x900 block 0x9802bdc
> > Sep 10 10:14:57 ATLAS kernel: (xlog_iodone") error  5 buf count 32768
> > Sep 10 10:14:57 ATLAS kernel:  xfs_force_shutdown(md(9,0),0x2) called
> > from line 940 of file xfs_log.c.  Return address - 0xd8cb66f8
> > Sep 10 10:14:57 ATLAS kernel: Log I/O Error  Detected. Shutting down
> > filesystem: md(9,0)
> > Sep 10 10:14:57 ATLAS kernel:  Please umount the filesystem, and rectify
> > the problem(s)
> > Sep 10 10:14:57 ATLAS kernel: xfs_force_shutdown(md(9,0),0x2) called
> > from line 714 of file  xfs_log.c. Return address = 0xd8cb65d3
> > Sep 10 10:14:57 ATLAS kernel: attempt  to access beyond end of device
> > Sep 10 10:14:57 ATLAS kernel: 02:82: rw=0,  want=1602235696, limit=4
> >
> > I did think at the time that it may have been issues with XFS stomping
> > all over raid code or raid code stomping all over XFS.  Although I not
> > sure now as the 2.4.10-pre2-xfs-2001-09-02 patch never wrote any errors
> > out at all. (please see my 2nd post for more info)
> >
> > Thanks for taking the time to test this on your own machine.
> 
> I tried 20, 40 and 80 simultanous cp with no crash. Then I changed the
> file tree and the new tree has ~280M small files with 100b-50kb size.
> When using 60 cp jobs the machine died. I could ping it but nothing
> more. No ssh, no console, no shutdown. I try some more tests tonight. I
> try the same with ext2 as well to make sure it's XFS and not Softraid.

Update: I tried the 60 cp jobs on a ext2 filesystem and the system is
still alive but 58 of the 60 cp jobs are hanging. Well maybe the 2.4.3
kernel is a bit old now... I'll try some more tests.

Simon


> 
> -Simon
> 
> >
> > Adrian Head
> > Bytecomm P/L
> >
> > > -----Original Message-----
> > > From: Simon Matter [SMTP:simon.matter@xxxxxxxxxxxxxxxx]
> > > Sent: Monday, 10 September 2001 17:45
> > > To:   adrian.head@xxxxxxxxxxxxxxx
> > > Cc:   linux-xfs@xxxxxxxxxxx
> > > Subject:      Re: Problems with many processes copying large
> > > directories across an XFS  volume.
> > >
> > > Hi Adrian
> > >
> > > I did similar tests two months ago. I was having problems as well but
> > > ufurtunately I don't remember what is was exactly.
> > > First question: You created Softraid5, was the raid synced when you
> > > started the tests?
> > >
> > > > In the /var/log/messages log around the same time as the copy test I
> > > get
> > > > entries like:
> > > > Sep 9 05:13:46 ATLAS kernel: 02:86: rw=0, want=156092516, limit=360
> > > > Sep 9 05:13:46 ATLAS kernel: attempt to access beyond end of device
> > >
> > > This looks interesting. I don't know what this means exactly but it
> > > looks to me like you managed to create a filesystem bigger than the
> > > raid
> > > volume was? I got the very same error when I tried to restore data
> > > with
> > > xfsrestore from DAT (xfsrestore from DLT was fine). The issue is still
> > > open.
> > >
> > > I have a test system here with SoftRAID5 on 4 U160 SCSI disks. I'll
> > > try
> > > to kill it today with cp jobs.
> > >
> > > -Simon
> > >



<Prev in Thread] Current Thread [Next in Thread>