xfs
[Top] [All Lists]

Re: xfs barfing on TB fs.

To: Nathan Scott <nathans@xxxxxxx>
Subject: Re: xfs barfing on TB fs.
From: Greg Whynott <greg@xxxxxxxxxxxxxxxxxx>
Date: Mon, 17 Nov 2003 12:32:09 -0500
Cc: linux-xfs@xxxxxxxxxxx
Organization: Calibre Digital Pictures
References: <3FAA9F75.7915E777@xxxxxxxxxxxxxxxxxx> <11738.1068157873@xxxxxxxxxxxxxxxxxxxxx> <20031106233645.GD782@frodo>
Sender: linux-xfs-bounce@xxxxxxxxxxx
Hi Nathen et al.,

        I have submitted a bugzilla report (ticket 287) which may have have
more details.

I'll do my best here to depict the setup and config steps.

The hardware:

. IBM XSeries 335 Dual P4, 1G memory, mirror set root drives.
. 1 dual port QLA2300 FC HBA.
. 3 gigabit ethernet ports. 2 built in and bonded to switch(in cisco
speak, etherchannel) the third an option.
. 1 4TB JetStor III IDE hardware raid 5 connected to server via 2GB FC
P2P.


The Bits:

. RH9 base with current updates excluding kernel.
. manually compiled kernel (from kernel.org) with XFS patches applied
(from oss.sgi) (same kernel/patch set used on other machines in
production.)
. manually compiled qla2300 module.
. manually compiled samba.
. Array has 2 partitions, created via hardware controller within the
chassis. (RAID5 over 14 300GB disks)
. Partitions formatted to 2TB and 1.6TB.
. fdisk was not used at all in the process,  not to further partition
nor to tag the partitions.  I ran mkfs_xfs on the raw partitions.
. created xfs fs using default options.
. As Keith correctly mentioned the kernel was recompiled to include high
mem support without cleaning the build area up first. I know now why
I'll not do that again. 



The 2 bonded interfaces are used on another network than the 1 gig
interface sits in.  All are linked at 1000Mbps FD.  Lets call the bonded
interface the backup interface and the other the production interface. 
Over the backup interface I run rsync jobs and the tape backups.  rsync
syncs a few large (several hundred gigs) production areas onto the 1.6
partition a few times a day.  The  production interface is the one
clients hit to access the second 2TB partition.

The error will happen (not consistently) when an rsync job(or two) has
fired off during production hours and the clients are also accessing
data.  The majority of files are greater than 1.2 megs,  1.2 megs for
NTSC and 12meg files for the HD projects.  There is about a 60/40
split.  The last time this happened while I was paying attention,  there
seemed to be 300 open files from samba clients (EXCLUSIVE+BATCH)
according to smbstatus, as well as the rsync job running.  

I have included an attachment (foo.tar.gz) with some additional info, 
dmesg output, message logs showing the frequency of the error, modules
loaded, and the kernel config used to build. If there is anything else I
have missed please let me know.

 

take care,
greg




  

Nathan Scott wrote:

> And if you have a reproducible test case showing this problem,
> I would _really_ like to hear the details, please -- an exact
> recipe, from go (mkfs) to woe would be extremely helpful.
> 
> Thanks for reporting the problem, btw.




-- 
UNIX is user friendly, it's just selective about who its friends are.

Attachment: foo.tar.gz
Description: GNU Zip compressed data

<Prev in Thread] Current Thread [Next in Thread>