xfs
[Top] [All Lists]

Re: Problems with many processes copying large directories acrossan XFS

To: Adrian Head <adrian.head@xxxxxxxxxxxxxxx>
Subject: Re: Problems with many processes copying large directories acrossan XFS volume.
From: Simon Matter <simon.matter@xxxxxxxxxxxxxxxx>
Date: Mon, 10 Sep 2001 10:57:34 +0200
>received: from mobile.sauter-bc.com (unknown [10.1.6.21]) by basel1.sauter-bc.com (Postfix) with ESMTP id 3A1B157306; Mon, 10 Sep 2001 10:57:34 +0200 (CEST)
Cc: linux-xfs@xxxxxxxxxxx
Organization: Sauter AG, Basel
References: <D1F276F384A0D311A7C500A0C9E89B381980E1@herbie.local>
Sender: owner-linux-xfs@xxxxxxxxxxx
Adrian Head schrieb:
> 
> Thanks for your reply Simon
> 
> Yes the softraid was fully synced before I started any test.

When I started playing around with XFS I used Raid5 and did stress tests
while the disks were syncing. The syncing failed several times and I was
afraid it is not stable not stress the raid while it is syncing.
Fortunately it is not true but the sync aborted due to bad disks. So no
problem even if raid is not synced.

> 
> The XFS patch I used to obtain these errors was
> patch-2.4.9-xfs-2001-08-19 and the errors were:
> Sep 9 05:13:46 ATLAS kernel: 02:86: rw=0, want=156092516, limit=360
> Sep 9 05:13:46 ATLAS kernel: attempt to access beyond end of device
> 
> When I used a later version of the XFS patch I had more descriptive
> errors written to /var/log/messages:
> Sep 10 10:14:57 ATLAS kernel: I/O error in filesystem ("md(9,0)")
> meta-data  dev 0x900 block 0x9802bdc
> Sep 10 10:14:57 ATLAS kernel: (xlog_iodone") error  5 buf count 32768
> Sep 10 10:14:57 ATLAS kernel:  xfs_force_shutdown(md(9,0),0x2) called
> from line 940 of file xfs_log.c.  Return address - 0xd8cb66f8
> Sep 10 10:14:57 ATLAS kernel: Log I/O Error  Detected. Shutting down
> filesystem: md(9,0)
> Sep 10 10:14:57 ATLAS kernel:  Please umount the filesystem, and rectify
> the problem(s)
> Sep 10 10:14:57 ATLAS kernel: xfs_force_shutdown(md(9,0),0x2) called
> from line 714 of file  xfs_log.c. Return address = 0xd8cb65d3
> Sep 10 10:14:57 ATLAS kernel: attempt  to access beyond end of device
> Sep 10 10:14:57 ATLAS kernel: 02:82: rw=0,  want=1602235696, limit=4
> 
> I did think at the time that it may have been issues with XFS stomping
> all over raid code or raid code stomping all over XFS.  Although I not
> sure now as the 2.4.10-pre2-xfs-2001-09-02 patch never wrote any errors
> out at all. (please see my 2nd post for more info)

I don't use anything special here and no cutting edge stuff. Just the
good old 2.4.3-XFS from the RH-7.1-XFS installer.

> 
> Thanks for taking the time to test this on your own machine.

The test with 20 simultaneous has just finished successfully. I'm now
trying with 40 cp processes. Due to limited disk space I copy only 470M
per process.

-Simon

> 
> Adrian Head
> Bytecomm P/L
> 
> > -----Original Message-----
> > From: Simon Matter [SMTP:simon.matter@xxxxxxxxxxxxxxxx]
> > Sent: Monday, 10 September 2001 17:45
> > To:   adrian.head@xxxxxxxxxxxxxxx
> > Cc:   linux-xfs@xxxxxxxxxxx
> > Subject:      Re: Problems with many processes copying large
> > directories across an XFS  volume.
> >
> > Hi Adrian
> >
> > I did similar tests two months ago. I was having problems as well but
> > ufurtunately I don't remember what is was exactly.
> > First question: You created Softraid5, was the raid synced when you
> > started the tests?
> >
> > > In the /var/log/messages log around the same time as the copy test I
> > get
> > > entries like:
> > > Sep 9 05:13:46 ATLAS kernel: 02:86: rw=0, want=156092516, limit=360
> > > Sep 9 05:13:46 ATLAS kernel: attempt to access beyond end of device
> >
> > This looks interesting. I don't know what this means exactly but it
> > looks to me like you managed to create a filesystem bigger than the
> > raid
> > volume was? I got the very same error when I tried to restore data
> > with
> > xfsrestore from DAT (xfsrestore from DLT was fine). The issue is still
> > open.
> >
> > I have a test system here with SoftRAID5 on 4 U160 SCSI disks. I'll
> > try
> > to kill it today with cp jobs.
> >
> > -Simon
> >



<Prev in Thread] Current Thread [Next in Thread>