xfs
[Top] [All Lists]

Re: slow ls -l

To: Qing Chang <qchang@xxxxxxxxxxxxxxx>
Subject: Re: slow ls -l
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Wed, 13 Nov 2013 11:42:31 +1100
Cc: stan@xxxxxxxxxxxxxxxxx, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <52827024.8020705@xxxxxxxxxxxxxxx>
References: <526EAFD1.1030500@xxxxxxxxxxxxxxx> <526F7600.8060102@xxxxxxxxxxxxxxxxx> <526FC027.3040906@xxxxxxxxxxxxxxx> <526FC84D.7070905@xxxxxxxxxxxxxxxxx> <5273FCCB.60601@xxxxxxxxxxxxxxx> <527741BF.1000309@xxxxxxxxxxxxxxxxx> <52827024.8020705@xxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Tue, Nov 12, 2013 at 01:15:00PM -0500, Qing Chang wrote:
> Hi Stan,
> 
> thank you very much for your comments, I need some time to
> follow through all of your suggestions.
> 
> Yes, this is a dovecot IMAP and sendmail SMTP server, and there
> are 1360 mailboxes.
> 
> With this email, I'd like to give you a bit more information with
> strace -T and others commands.
> 
> Because "# CONFIG_LOCK_STAT is not set", I can not run perf
> lock record. Since this is a production server, I am reluctant to
> rebuild the kernel on it. But I hope the info below may give
> you an idea that it is not "lock" related. As slow "ls -l" and "du"
> also happen on the two file servers (64TB and 32TB), it does not
> feel like it is lock related.
> 
> I run this "(time strace -T ls -l /var/spool/mail > ls-out) >& ls-trace"
> and got quite a bit data.
> 
> First "time" output:
> =====
> real    0m17.720s
> user    0m0.574s
> sys     0m1.271s
> =====
> 
> Some 14s was spent on 164 "poll" calls on fd=5 that took more than 10ms,
> total number of "poll" call is 6618!

So, think about it for a minute. What's that fd?

> ##### df=5 opened:
> socket(PF_FILE, SOCK_STREAM, 0)         = 5 <0.000016>
> fcntl(5, F_GETFL)                       = 0x2 (flags O_RDWR) <0.000010>
> fcntl(5, F_SETFL, O_RDWR|O_NONBLOCK)    = 0 <0.000010>
> fcntl(5, F_GETFD)                       = 0 <0.000010>
> fcntl(5, F_SETFD, FD_CLOEXEC)           = 0 <0.000010>
> connect(5, {sa_family=AF_FILE, path="/var/lib/sss/pipes/nss"}, 110) = 0 
> <0.001886>
> fstat(5, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 <0.000010>

So, in some cases it's taking 10ms to get some information from the
other end of a pipe. ANd that pipe has something to do with SSSD.

> poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) 
> <0.000012>
> sendto(5, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 
> 16 <0.000029>
> poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) 
> <0.000012>
> sendto(5, "\1\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000027>
> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) 
> <0.000011>
> read(5, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000013>
> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) 
> <0.000011>
> read(5, "\1\0\0\0", 4)                  = 4 <0.000025>
> poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) 
> <0.000011>
> sendto(5, "\24\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 
> 16 <0.000026>
> poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) 
> <0.000011>
> sendto(5, "\225N\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000025>
> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) 
> <0.000515>
> read(5, "L\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000012>
> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) 
> <0.000011>
> read(5, "\1\0\0\0\0\0\0\0\225N\0\0\311\0\0\0qchang\0*\0Qing Ch"..., 60) = 60 
> <0.000026>

That looks like uid ->username and gid -> group name translations.
i.e. you've got a slow LDAP server and that is what is causing your
problems, not anything to do with XFS....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>