xfs
[Top] [All Lists]

Re: xfs_fsr, sunit, and swidth

To: stan@xxxxxxxxxxxxxxxxx
Subject: Re: xfs_fsr, sunit, and swidth
From: Dave Hall <kdhall@xxxxxxxxxxxxxx>
Date: Thu, 14 Mar 2013 10:59:27 -0400
Cc: "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <5141C8C1.2080903@xxxxxxxxxxxxxxxxx>
References: <5140C147.7070205@xxxxxxxxxxxxxx> <514113C6.9090602@xxxxxxxxxxxxxxxxx> <514153ED.3000405@xxxxxxxxxxxxxx> <5141C1FC.4060209@xxxxxxxxxxxxxxxxx> <5141C8C1.2080903@xxxxxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.16) Gecko/20121215 Icedove/3.0.11

Dave Hall
Binghamton University
kdhall@xxxxxxxxxxxxxx
607-760-2328 (Cell)
607-777-4641 (Office)

On 03/14/2013 08:55 AM, Stan Hoeppner wrote:
Yes,  please provide the output of the following commands:
    
~$ uname -a
  
Linux decoy 3.2.0-0.bpo.4-amd64 #1 SMP Debian 3.2.35-2~bpo60+1 x86_64 GNU/Linux
> ~$ grep xfs /etc/fstab
    
LABEL=backup        /infortrend    xfs    inode64,noatime,nodiratime,nobarrier    0    0
(cat /proc/mounts:  /dev/sdb1 /infortrend xfs rw,noatime,nodiratime,attr2,delaylog,nobarrier,inode64,noquota 0 0)


Note that there is also a second XFS on a separate 3ware raid card, but the I/O traffic on that one is fairly low.  It is used as a staging area for a Debian mirror that is hosted on another server.
> ~$ xfs_info /dev/[mount-point]
    
# xfs_info /dev/sdb1
meta-data="" isize=256    agcount=26, agsize=268435455 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=6836364800, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=521728, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

> ~$ df /dev/[mount_point]
    
# df /dev/sdb1
Filesystem           1K-blocks      Used Available Use% Mounted on

/dev/sdb1            27343372288 20432618356 6910753932  75% /infortrend
> ~$ df -i /dev/[mount_point]
    
# df -i /dev/sdb1
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sdb1            5469091840 1367746380 4101345460   26% /infortrend

> ~$ xfs_db -r -c freesp /dev/[mount-point]
    
# xfs_db -r -c freesp /dev/sdb1
   from      to extents  blocks    pct
      1       1  832735  832735   0.05
      2       3  432183 1037663   0.06
      4       7  365573 1903965   0.11
      8      15  352402 3891608   0.23
     16      31  332762 7460486   0.43
     32      63  300571 13597941   0.79
     64     127  233778 20900655   1.21
    128     255  152003 27448751   1.59
    256     511  112673 40941665   2.37
    512    1023   82262 59331126   3.43
   1024    2047   53238 76543454   4.43
   2048    4095   34092 97842752   5.66
   4096    8191   22743 129915842   7.52
   8192   16383   14453 162422155   9.40
  16384   32767    8501 190601554  11.03
  32768   65535    4695 210822119  12.20
  65536  131071    2615 234787546  13.59
 131072  262143    1354 237684818  13.76
 262144  524287     470 160228724   9.27
 524288 1048575      74 47384798   2.74
1048576 2097151       1 2097122   0.12

> 
> Also please provide the make/model of the RAID controller, the write
> cache size and if it is indeed enabled and working, as well as any
> errors, if any, logged by the controller in dmesg or elsewhere in Linux,
> or in the controller firmware.
> 
    
The RAID box is an Infortrend S16S-G1030 with 512MB cache and a fully functional battery.  I couldn't find  any details about the internal RAID implementation used by Infortrend.   The array is SAS attached to an LSI HBA (SAS2008 PCI-Express Fusion-MPT SAS-2). 

The system hardware is a SuperMicro quad 8-core XEON E7-4820 2.0GHz with 128 GB of ram, hyper-theading enabled.  (This is something that I inherited.  There is no doubt that it is overkill.)
>> 
Another bit of information that you didn't ask about is the I/O scheduler algorithm.  I just checked and found it set to 'cfq', although I though I had set it to 'noop' via a kernel parameter in GRUB.

Also, some observations about the cp -al:  In parallel to investigating hardware/OS/filesystem issue I have done some experiments with cp -al.  It hurts to have 64 cores available and see cp -al running the wheels off just one, with a couple others slightly active with system level duties.  So I tried some experiments where I copied smaller segments of the file tree in parallel (using make -j).  I haven't had the chance to fully play this out, but these parallel cp invocations completed very quickly.  So it would appear that the cp command itself may bog down with such a large file tree.  I haven't had a chance to tear apart the source code or do any profiling to see if there are any obvious problems there.

Lastly, I will mention that I see almost 0% wa when watching top.
<Prev in Thread] Current Thread [Next in Thread>