[Top] [All Lists]

Re: XFS errors on large Infiniband fileserver setup

To: Christian Herzog <horeizo@xxxxxxxxxxxx>
Subject: Re: XFS errors on large Infiniband fileserver setup
From: Emmanuel Florac <eflorac@xxxxxxxxxxxxxx>
Date: Fri, 24 Sep 2010 15:19:18 +0200
Cc: xfs@xxxxxxxxxxx
In-reply-to: <2276da2491527ca8044fa1daec496b48@xxxxxxxxxxxx>
Organization: Intellique
References: <29252416bd0d9dc973a909e411dbec6a@xxxxxxxxxxxx> <20100923235355.GO2614@dastard> <2276da2491527ca8044fa1daec496b48@xxxxxxxxxxxx>
Le Fri, 24 Sep 2010 07:41:53 +0200
Christian Herzog <horeizo@xxxxxxxxxxxx> écrivait:

>  We start off
> with 52T and can easily add additional disk units to the Infiniband
> switch.

I don't know if using iscsi-over-infiniband is optimal. The problem is
that you plan to expand your existing XFS filesystem by leaps and
bounds up to some extremely large size, through a simple raid-0 like
aggregation, that's really fragile.

It's a configuration that's supposed to work well with something like
ZFS, though in real life setups (no I won't tell who sent back recently
all of a 2 PB cluster to Sun but that happened in 2010 :) large raid-z
iscsi clusters aren't so great :)

For similar setups, I used PVFS2 by aggregating 40 TB nodes. PVFS2 is
known to scale up to petabytes, and (contrary to XFS over RAID-0) is
extremely tolerant to node failure (though it is not redundant); if a
node crashes, the cluster IO may freeze (though write activity can
usually go on) but restart instantly when the failed node is revived.

However PVFS isn't made for general purpose file sharing (though it
works with both samba and nfs), but really flies when used with
applications properly set up (MPIO). It's tailored for scientific work
and heavy computation clusters.

In contrast with Lustre, PVFS2 is very easy to set up, and very easy
to extend if you planned it from the start (Lustre is a fantastic PITA
to set up and administer, and don't even talk about NFS sharing).

So I would set up a storage cluster this way : each storage node is a
PVFS server, the PVFS data resides on an XFS filesystem (officially
recommended by PVFS developers anyway).
You can expand the PVFS filesystem either by enlarging the XFS on the
storage nodes, or by adding new independant storage nodes.

Each storage node can be a PVFS client too, and use its computing power
to crunch data. 

As I said, the main problem is to know how you plan to make space
available to clients systems. You can use NFS/CIFS for ease of use,
desktop access, etc but performance will be low. However native PVFS
performance can be huge over infiniband (in the several GB/s range).
And the more storage nodes you're adding, the more performance you get.

Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |   <eflorac@xxxxxxxxxxxxxx>
                    |   +33 1 78 94 84 02

<Prev in Thread] Current Thread [Next in Thread>