slow ls -l
Qing Chang
qchang at sri.utoronto.ca
Wed Nov 13 08:23:31 CST 2013
On 12/11/2013 7:42 PM, Dave Chinner wrote:
> On Tue, Nov 12, 2013 at 01:15:00PM -0500, Qing Chang wrote:
>> Hi Stan,
>>
>> thank you very much for your comments, I need some time to
>> follow through all of your suggestions.
>>
>> Yes, this is a dovecot IMAP and sendmail SMTP server, and there
>> are 1360 mailboxes.
>>
>> With this email, I'd like to give you a bit more information with
>> strace -T and others commands.
>>
>> Because "# CONFIG_LOCK_STAT is not set", I can not run perf
>> lock record. Since this is a production server, I am reluctant to
>> rebuild the kernel on it. But I hope the info below may give
>> you an idea that it is not "lock" related. As slow "ls -l" and "du"
>> also happen on the two file servers (64TB and 32TB), it does not
>> feel like it is lock related.
>>
>> I run this "(time strace -T ls -l /var/spool/mail > ls-out) >& ls-trace"
>> and got quite a bit data.
>>
>> First "time" output:
>> =====
>> real 0m17.720s
>> user 0m0.574s
>> sys 0m1.271s
>> =====
>>
>> Some 14s was spent on 164 "poll" calls on fd=5 that took more than 10ms,
>> total number of "poll" call is 6618!
> So, think about it for a minute. What's that fd?
>
>> ##### df=5 opened:
>> socket(PF_FILE, SOCK_STREAM, 0) = 5 <0.000016>
>> fcntl(5, F_GETFL) = 0x2 (flags O_RDWR) <0.000010>
>> fcntl(5, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000010>
>> fcntl(5, F_GETFD) = 0 <0.000010>
>> fcntl(5, F_SETFD, FD_CLOEXEC) = 0 <0.000010>
>> connect(5, {sa_family=AF_FILE, path="/var/lib/sss/pipes/nss"}, 110) = 0 <0.001886>
>> fstat(5, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 <0.000010>
> So, in some cases it's taking 10ms to get some information from the
> other end of a pipe. ANd that pipe has something to do with SSSD.
>
>> poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000012>
>> sendto(5, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000029>
>> poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000012>
>> sendto(5, "\1\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000027>
>> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000011>
>> read(5, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000013>
>> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000011>
>> read(5, "\1\0\0\0", 4) = 4 <0.000025>
>> poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011>
>> sendto(5, "\24\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000026>
>> poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011>
>> sendto(5, "\225N\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000025>
>> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000515>
>> read(5, "L\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000012>
>> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000011>
>> read(5, "\1\0\0\0\0\0\0\0\225N\0\0\311\0\0\0qchang\0*\0Qing Ch"..., 60) = 60 <0.000026>
> That looks like uid ->username and gid -> group name translations.
> i.e. you've got a slow LDAP server and that is what is causing your
> problems, not anything to do with XFS....
That's the nail on the head! This also explains why subsequent runs are faster because
SSSD caches. (I did a ls -l -n:-)
A fat thanks to all that replied, especially Stan for keeping this rolling.
BTW, my choice of XFS mainly _is_ because it supports huge size and I am happy
so far with it! Happier now...
Sincerely,
Qing
> Cheers,
>
> Dave.
More information about the xfs
mailing list