xfs
[Top] [All Lists]

Re: slow ls -l

To: Linux fs XFS <xfs@xxxxxxxxxxx>
Subject: Re: slow ls -l
From: pg@xxxxxxxxxxxxxxxxxxx (Peter Grandi)
Date: Tue, 12 Nov 2013 23:25:03 +0000
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <52827024.8020705@xxxxxxxxxxxxxxx>
References: <526EAFD1.1030500@xxxxxxxxxxxxxxx> <526F7600.8060102@xxxxxxxxxxxxxxxxx> <526FC027.3040906@xxxxxxxxxxxxxxx> <526FC84D.7070905@xxxxxxxxxxxxxxxxx> <5273FCCB.60601@xxxxxxxxxxxxxxx> <527741BF.1000309@xxxxxxxxxxxxxxxxx> <52827024.8020705@xxxxxxxxxxxxxxx>
[ ... ]

> Yes, this is a dovecot IMAP and sendmail SMTP server, and
> there are 1360 mailboxes.

That's more or less worst case for metadata accesses. Probably
XFS is performing very well, and 'ls -l' is as fast as it can be.
But you are only listing 1360 inodes, so even if slow it should
not be that slow.

> As slow "ls -l" and "du" also happen on the two file servers
> (64TB and 32TB), [ ... ]

People who "know better" like large partitions like that. Then
you pay the consequences later... But that's another issue :-).

> I run this "(time strace -T ls -l /var/spool/mail > ls-out) >& ls-trace"
> and got quite a bit data.

> First "time" output:
> =====
> real    0m17.720s
> user    0m0.574s
> sys     0m1.271s
> =====

> Some 14s was spent on 164 "poll" calls on fd=5 that took more than 10ms,
> total number of "poll" call is 6618!

That's for network calls. Nothing to do with filesystem:

> connect(5, {sa_family=AF_FILE, path="/var/lib/sss/pipes/nss"}, 110) = 0 
> <0.001886>
> fstat(5, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 <0.000010>
> poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) 
> <0.000012>
> sendto(5, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 
> 16 <0.000029>
> poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) 
> <0.000012>
> sendto(5, "\1\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000027>
> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) 
> <0.000011>
> read(5, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000013>
> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) 
> <0.000011>
> read(5, "\1\0\0\0", 4)                  = 4 <0.000025>
> poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) 
> <0.000011>
> sendto(5, "\24\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 
> 16 <0.000026>
> poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) 
> <0.000011>
> sendto(5, "\225N\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000025>
> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) 
> <0.000515>
> read(5, "L\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000012>
> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) 
> <0.000011>
> read(5, "\1\0\0\0\0\0\0\0\225N\0\0\311\0\0\0qchang\0*\0Qing Ch"..., 60) = 60 
> <0.000026>

This is 'getpwuid' as 'ls' tries to resolve a (numeric) user id
to a (string) user name and you have SSS configured in
'/etc/nsswith.conf' and it is probably using some very slow
network directory system, but whatever it is, it is not
a filesystem matter.

You can easily double check by running 'ls -l -n' which will
'stat' the 1360 i-nodes all the same but without resolving uids
to usernames, and that should be 14s faster based on what you
report above.

This is the busiest IO load section you have provided:

> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>             4.24    0.00    1.16    0.58    0.00   94.02

> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s wMB/s avgrq-sz 
> avgqu-sz   await  svctm   %util
> sda               0.00  1159.20    0.00  985.80     0.00 8.38 17.41     0.34  
>   0.35   0.18  17.84
> sdb               0.00     0.00    0.20    1.40     0.00 0.01  8.62     0.00  
>   0.75   0.75   0.12
> sdc               0.00     0.00    0.00    0.20     0.00 0.00 16.00     0.00  
>   1.00   1.00   0.02
> dm-0              0.00     0.00    0.00    5.00     0.00 0.02  8.00     0.00  
>   0.24   0.24   0.12
> dm-2              0.00     0.00    0.00 2140.00     0.00 8.36  8.00     1.23  
>   0.58   0.08  17.90

and it shows only some low bandwidth writing, and very low IOPS
utilization anyhow, so more confirmation this is nothing to do
with XFS or storage in general.

As to why 'du' is slow, well, it does not by default resolve
uids to usernames, so that is probably because you have lots of
small inodes scattered all over the place. Guessing wildly your
Dovecot uses the catastrophic Maildir format or a variant, and
as to your 64/32TB filetrees, good luck... :-)

<Prev in Thread] Current Thread [Next in Thread>