xfs
[Top] [All Lists]

Re: Problems with many processes copying large directories acrossan XFS

To: Adrian Head <adrian.head@xxxxxxxxxxxxxxx>, linux-xfs@xxxxxxxxxxx
Subject: Re: Problems with many processes copying large directories acrossan XFS volume.
From: Simon Matter <simon.matter@xxxxxxxxxxxxxxxx>
Date: Thu, 13 Sep 2001 10:44:28 +0200
>received: from mobile.sauter-bc.com (unknown [10.1.6.21]) by basel1.sauter-bc.com (Postfix) with ESMTP id D14B257306; Thu, 13 Sep 2001 10:44:28 +0200 (CEST)
Organization: Sauter AG, Basel
References: <D1F276F384A0D311A7C500A0C9E89B381980E1@herbie.local> <3B9CCE00.D704DC0B@ch.sauter-bc.com> <3B9DA493.11CD4C16@ch.sauter-bc.com>
Sender: owner-linux-xfs@xxxxxxxxxxx
Simon Matter schrieb:
> 
> Simon Matter schrieb:
> >
> > Adrian Head schrieb:
> > >
> > > Thanks for your reply Simon
> > >
> > > Yes the softraid was fully synced before I started any test.
> > >
> > > The XFS patch I used to obtain these errors was
> > > patch-2.4.9-xfs-2001-08-19 and the errors were:
> > > Sep 9 05:13:46 ATLAS kernel: 02:86: rw=0, want=156092516, limit=360
> > > Sep 9 05:13:46 ATLAS kernel: attempt to access beyond end of device
> > >
> > > When I used a later version of the XFS patch I had more descriptive
> > > errors written to /var/log/messages:
> > > Sep 10 10:14:57 ATLAS kernel: I/O error in filesystem ("md(9,0)")
> > > meta-data  dev 0x900 block 0x9802bdc
> > > Sep 10 10:14:57 ATLAS kernel: (xlog_iodone") error  5 buf count 32768
> > > Sep 10 10:14:57 ATLAS kernel:  xfs_force_shutdown(md(9,0),0x2) called
> > > from line 940 of file xfs_log.c.  Return address - 0xd8cb66f8
> > > Sep 10 10:14:57 ATLAS kernel: Log I/O Error  Detected. Shutting down
> > > filesystem: md(9,0)
> > > Sep 10 10:14:57 ATLAS kernel:  Please umount the filesystem, and rectify
> > > the problem(s)
> > > Sep 10 10:14:57 ATLAS kernel: xfs_force_shutdown(md(9,0),0x2) called
> > > from line 714 of file  xfs_log.c. Return address = 0xd8cb65d3
> > > Sep 10 10:14:57 ATLAS kernel: attempt  to access beyond end of device
> > > Sep 10 10:14:57 ATLAS kernel: 02:82: rw=0,  want=1602235696, limit=4
> > >
> > > I did think at the time that it may have been issues with XFS stomping
> > > all over raid code or raid code stomping all over XFS.  Although I not
> > > sure now as the 2.4.10-pre2-xfs-2001-09-02 patch never wrote any errors
> > > out at all. (please see my 2nd post for more info)
> > >
> > > Thanks for taking the time to test this on your own machine.
> >
> > I tried 20, 40 and 80 simultanous cp with no crash. Then I changed the
> > file tree and the new tree has ~280M small files with 100b-50kb size.
> > When using 60 cp jobs the machine died. I could ping it but nothing
> > more. No ssh, no console, no shutdown. I try some more tests tonight. I
> > try the same with ext2 as well to make sure it's XFS and not Softraid.
> 
> Update: I tried the 60 cp jobs on a ext2 filesystem and the system is
> still alive but 58 of the 60 cp jobs are hanging. Well maybe the 2.4.3
> kernel is a bit old now... I'll try some more tests.

I have just installed the 2.4.10-pre2 kernel from
http://rpms.aicompro.net/ and tried my stresstest with 60 instances of
cp and it has crashed as well. Not good. I'm not sure whether kdb is in
those RPMS but when the machine hung I couldn't get into kdb (sysctl
switch was on). Usually you can unblank the console by pressing shift or
numlock but this time it was really dead. I'm now right now trying the
same test on a non raid partition.

Simon

> 
> Simon
> 
> >
> > -Simon
> >
> > >
> > > Adrian Head
> > > Bytecomm P/L
> > >
> > > > -----Original Message-----
> > > > From: Simon Matter [SMTP:simon.matter@xxxxxxxxxxxxxxxx]
> > > > Sent: Monday, 10 September 2001 17:45
> > > > To:   adrian.head@xxxxxxxxxxxxxxx
> > > > Cc:   linux-xfs@xxxxxxxxxxx
> > > > Subject:      Re: Problems with many processes copying large
> > > > directories across an XFS  volume.
> > > >
> > > > Hi Adrian
> > > >
> > > > I did similar tests two months ago. I was having problems as well but
> > > > ufurtunately I don't remember what is was exactly.
> > > > First question: You created Softraid5, was the raid synced when you
> > > > started the tests?
> > > >
> > > > > In the /var/log/messages log around the same time as the copy test I
> > > > get
> > > > > entries like:
> > > > > Sep 9 05:13:46 ATLAS kernel: 02:86: rw=0, want=156092516, limit=360
> > > > > Sep 9 05:13:46 ATLAS kernel: attempt to access beyond end of device
> > > >
> > > > This looks interesting. I don't know what this means exactly but it
> > > > looks to me like you managed to create a filesystem bigger than the
> > > > raid
> > > > volume was? I got the very same error when I tried to restore data
> > > > with
> > > > xfsrestore from DAT (xfsrestore from DLT was fine). The issue is still
> > > > open.
> > > >
> > > > I have a test system here with SoftRAID5 on 4 U160 SCSI disks. I'll
> > > > try
> > > > to kill it today with cp jobs.
> > > >
> > > > -Simon
> > > >

-- 
Simon Matter              Tel:  +41 61 695 57 35
Fr.Sauter AG / CIT        Fax:  +41 61 695 53 30
Im Surinam 55
CH-4016 Basel             [mailto:simon.matter@xxxxxxxxxxxxxxxx]



<Prev in Thread] Current Thread [Next in Thread>