slow ls -l

Qing Chang qchang at sri.utoronto.ca
Tue Nov 12 12:15:00 CST 2013


Hi Stan,

thank you very much for your comments, I need some time to
follow through all of your suggestions.

Yes, this is a dovecot IMAP and sendmail SMTP server, and there
are 1360 mailboxes.

With this email, I'd like to give you a bit more information with
strace -T and others commands.

Because "# CONFIG_LOCK_STAT is not set", I can not run perf
lock record. Since this is a production server, I am reluctant to
rebuild the kernel on it. But I hope the info below may give
you an idea that it is not "lock" related. As slow "ls -l" and "du"
also happen on the two file servers (64TB and 32TB), it does not
feel like it is lock related.

I run this "(time strace -T ls -l /var/spool/mail > ls-out) >& ls-trace"
and got quite a bit data.

First "time" output:
=====
real    0m17.720s
user    0m0.574s
sys     0m1.271s
=====

Some 14s was spent on 164 "poll" calls on fd=5 that took more than 10ms,
total number of "poll" call is 6618!

Below is the trace output, red numbers are the three longest time taken, listed
at top are those that took more than 100ms, the three sections are for the top
three time taken, note the three mailboxes are not special compare to another
mailboxes, a run at another time would produce calls on different mailboxes.
Also, cache seams to play a role too, after a fresh run that takes long time,
subsequent runs give instant results:
=====
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.144634>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.104580>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.104812>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.181959>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.102520>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.107539>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.142462>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.215745>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.309854>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.166665>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.101035>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.113119>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.120745>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.110027>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.137906>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.249039>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.104079>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.250003>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.103791>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.102018>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.103551>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.103696>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.102164>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.105843>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.104413>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.101304>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.101869>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.101366>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.132958>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.309411>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.100342>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.101279>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.144725>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.109867>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.109951>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.290189>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.102449>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.101182>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.111699>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.107351>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.104558>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.115447>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.163904>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.266427>

##### df=5 opened:
socket(PF_FILE, SOCK_STREAM, 0)         = 5 <0.000016>
fcntl(5, F_GETFL)                       = 0x2 (flags O_RDWR) <0.000010>
fcntl(5, F_SETFL, O_RDWR|O_NONBLOCK)    = 0 <0.000010>
fcntl(5, F_GETFD)                       = 0 <0.000010>
fcntl(5, F_SETFD, FD_CLOEXEC)           = 0 <0.000010>
connect(5, {sa_family=AF_FILE, path="/var/lib/sss/pipes/nss"}, 110) = 0 <0.001886>
fstat(5, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 <0.000010>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000012>
sendto(5, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000029>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000012>
sendto(5, "\1\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000027>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000011>
read(5, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000013>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000011>
read(5, "\1\0\0\0", 4)                  = 4 <0.000025>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011>
sendto(5, "\24\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000026>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011>
sendto(5, "\225N\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000025>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000515>
read(5, "L\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000012>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000011>
read(5, "\1\0\0\0\0\0\0\0\225N\0\0\311\0\0\0qchang\0*\0Qing Ch"..., 60) = 60 <0.000026>
open("/etc/group", O_RDONLY|O_CLOEXEC)  = 6 <0.000017>
fstat(6, {st_mode=S_IFREG|0644, st_size=790, ...}) = 0 <0.000010>
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd5d53f4000 <0.004539>
read(6, "root:x:0:root\nbin:x:1:root,bin,d"..., 4096) = 790 <0.000018>
read(6, "", 4096)                       = 0 <0.000011>
close(6)                                = 0 <0.000011>
munmap(0x7fd5d53f4000, 4096)            = 0 <0.000016>
open("/var/lib/sss/mc/group", O_RDONLY|O_CLOEXEC) = 6 <0.000021>
fstat(6, {st_mode=S_IFREG|0644, st_size=5206312, ...}) = 0 <0.000010>
mmap(NULL, 5206312, PROT_READ, MAP_SHARED, 6, 0) = 0x7fd5ce3c1000 <0.000015>
poll([{fd=5, events=POLLIN|POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000012>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011>
sendto(5, "\24\0\0\0\"\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000031>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011>
sendto(5, "\311\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000025>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000427>
read(5, "?\0\0\0\"\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000013>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000011>
read(5, "\1\0\0\0\0\0\0\0\311\0\0\0\3\0\0\0\0*\0qiu"..., 47) = 47 <0.000017>
lstat("/var/spool/mail/root", {st_mode=S_IFREG|0600, st_size=187649403, ...}) = 0 <0.000019>
lgetxattr("/var/spool/mail/root", "security.selinux", "unconfined_u:object_r:file_t:s0", 255) = 32 
<0.000020>
lstat("/var/spool/mail/root", {st_mode=S_IFREG|0600, st_size=187649403, ...}) = 0 <0.000014>
lgetxattr("/var/spool/mail/root", "system.posix_acl_access", 0x0, 0) = -1 ENODATA (No data 
available) <0.000018>
lgetxattr("/var/spool/mail/root", "system.posix_acl_default", 0x0, 0) = -1 ENODATA (No data 
available) <0.000016>
lstat("/var/spool/mail/agutstei", {st_mode=S_IFREG|0600, st_size=3864251, ...}) = 0 <0.000016>
lgetxattr("/var/spool/mail/agutstei", "security.selinux", "unconfined_u:object_r:file_t:s0", 255) = 
32 <0.000018>
lstat("/var/spool/mail/agutstei", {st_mode=S_IFREG|0600, st_size=3864251, ...}) = 0 <0.000015>
lgetxattr("/var/spool/mail/agutstei", "system.posix_acl_access", 0x0, 0) = -1 ENODATA (No data 
available) <0.000016>
lgetxattr("/var/spool/mail/agutstei", "system.posix_acl_default", 0x0, 0) = -1 ENODATA (No data 
available) <0.000015>
open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 7 <0.000017>
fstat(7, {st_mode=S_IFREG|0644, st_size=1835, ...}) = 0 <0.000011>
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd5ce3c0000 <0.000014>
read(7, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 1835 <0.000020>
read(7, "", 4096)                       = 0 <0.000011>
close(7)                                = 0 <0.000012>

###### took 0.309854s:
munmap(0x7fd5ce3c0000, 4096)            = 0 <0.000015>
poll([{fd=5, events=POLLIN|POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000010>
sendto(5, "\24\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000034>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000010>
sendto(5, "\253\260\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000335>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.309854>
read(5, "N\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000012>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000010>
read(5, "\1\0\0\0\0\0\0\0\253\260\0\0\253\1\0\0acqu\0*\0Chen Ge A"..., 62) = 62 <0.000016>
lstat("/var/spool/mail/araouf", {st_mode=S_IFREG|0600, st_size=3943191, ...}) = 0 <0.000052>
lgetxattr("/var/spool/mail/araouf", "security.selinux", "unconfined_u:object_r:file_t:s0", 255) = 32 
<0.000021>
lstat("/var/spool/mail/araouf", {st_mode=S_IFREG|0600, st_size=3943191, ...}) = 0 <0.000014>
lgetxattr("/var/spool/mail/araouf", "system.posix_acl_access", 0x0, 0) = -1 ENODATA (No data 
available) <0.000017>
lgetxattr("/var/spool/mail/araouf", "system.posix_acl_default", 0x0, 0) = -1 ENODATA (No data 
available) <0.000015>
open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 7 <0.000016>
fstat(7, {st_mode=S_IFREG|0644, st_size=1835, ...}) = 0 <0.000010>
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd5ce3c0000 <0.000012>
read(7, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 1835 <0.000018>
read(7, "", 4096)                       = 0 <0.000010>
close(7)                                = 0 <0.000012>

###### took 0.309411:
munmap(0x7fd5ce3c0000, 4096)            = 0 <0.000015>
poll([{fd=5, events=POLLIN|POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000010>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000009>
sendto(5, "\24\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000035>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000010>
sendto(5, "\357\246\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000342>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.309411>
read(5, "V\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000012>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000010>
read(5, "\1\0\0\0\0\0\0\0\357\246\0\0\253\1\0\0pthaya\0*\0Parthip"..., 70) = 70 <0.000018>
lstat("/var/spool/mail/ptsao", {st_mode=S_IFREG|0600, st_size=515358, ...}) = 0 <0.000025>
lgetxattr("/var/spool/mail/ptsao", "security.selinux", "unconfined_u:object_r:file_t:s0", 255) = 32 
<0.000021>
lstat("/var/spool/mail/ptsao", {st_mode=S_IFREG|0600, st_size=515358, ...}) = 0 <0.000014>
lgetxattr("/var/spool/mail/ptsao", "system.posix_acl_access", 0x0, 0) = -1 ENODATA (No data 
available) <0.000019>
lgetxattr("/var/spool/mail/ptsao", "system.posix_acl_default", 0x0, 0) = -1 ENODATA (No data 
available) <0.000014>
open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 7 <0.000017>
fstat(7, {st_mode=S_IFREG|0644, st_size=1835, ...}) = 0 <0.000010>
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd5ce3c0000 <0.000012>
read(7, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 1835 <0.000019>
read(7, "", 4096)                       = 0 <0.000009>
close(7)                                = 0 <0.000011>

###### took 0.290189:
munmap(0x7fd5ce3c0000, 4096)            = 0 <0.000015>
poll([{fd=5, events=POLLIN|POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000010>
sendto(5, "\24\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000034>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000010>
sendto(5, "\10\270\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000391>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.290189>
read(5, "N\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000012>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000010>
read(5, "\1\0\0\0\0\0\0\0\10\270\0\0\327\1\0\0taberk\0*\0Thomas "..., 62) = 62 <0.000017>
lstat("/var/spool/mail/taha", {st_mode=S_IFREG|0600, st_size=293648143, ...}) = 0 <0.000018>
lgetxattr("/var/spool/mail/taha", "security.selinux", "unconfined_u:object_r:file_t:s0", 255) = 32 
<0.000022>
lstat("/var/spool/mail/taha", {st_mode=S_IFREG|0600, st_size=293648143, ...}) = 0 <0.000013>
lgetxattr("/var/spool/mail/taha", "system.posix_acl_access", 0x0, 0) = -1 ENODATA (No data 
available) <0.000021>
lgetxattr("/var/spool/mail/taha", "system.posix_acl_default", 0x0, 0) = -1 ENODATA (No data 
available) <0.000014>
open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 7 <0.000021>
fstat(7, {st_mode=S_IFREG|0644, st_size=1835, ...}) = 0 <0.000010>
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd5ce3c0000 <0.000015>
read(7, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 1835 <0.000033>
read(7, "", 4096)                       = 0 <0.000011>
close(7)                                = 0 <0.000013>
=====

====== iostat around the time ls -l was run =====
[root at dovecot2 ~]# iostat -xmz 5
Linux 2.6.32-358.18.1.el6.x86_64 (dovecot2.sri.utoronto.ca) 11/12/2013      _x86_64_        (16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0.56    0.00    0.17    0.06    0.00   99.21

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  
%util
sda               0.01    19.88    0.14   26.86     0.00     0.18 14.18     0.02    0.57   0.28   0.75
sdb               0.23     0.08    1.75    1.98     0.04     0.02 30.38     0.01    2.81   1.26   0.47
sdc               0.00     0.03   13.78    0.71     1.34     0.05 196.35     0.02    1.60   0.77   1.11
dm-0              0.00     0.00    0.09    2.42     0.00 0.01     8.49     0.00    1.18   0.52   0.13
dm-1              0.00     0.00    0.00    0.00     0.00 0.00     8.00     0.00    3.09   0.27   0.00
dm-2              0.00     0.00    0.06   44.33     0.00 0.17     8.14     0.04    0.90   0.14   0.63

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            1.31    0.00    0.46    0.10    0.00   98.12

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  
%util
sda               0.00    27.40    1.00   65.80     0.02     0.36 11.78     0.04    0.61   0.43   2.86
sdb               0.00     0.00    1.80    0.00     0.03     0.00 31.11     0.01    6.44   2.00   0.36
sdc               0.00     1.00    0.00    4.80     0.00     0.03 11.79     0.03    6.88   0.46   0.22
dm-0              0.00     0.00    0.80    1.20     0.02     0.00 24.00     0.01    6.10   6.10   1.22
dm-2              0.00     0.00    0.00   92.00     0.00 0.36     8.00     0.06    0.62   0.19   1.76

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            5.76    0.00    1.62    0.59    0.00   92.03

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  
%util
sda               0.00  1045.60    0.80  959.40     0.05     7.83 16.82     0.44    0.46   0.21  19.98
sdb               0.00     0.00    1.40    1.20     0.01 0.01     9.23     0.00    0.62   0.54   0.14
sdc               0.00     0.00    0.00    0.60     0.00     0.01 18.67     0.00    0.33   0.33   0.02
dm-0              0.00     0.00    0.80    8.00     0.05     0.03 20.00     0.02    2.05   1.41   1.24
dm-2              0.00     0.00    0.00 1997.00     0.00 7.80     8.00     1.24    0.62   0.09  18.86

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            4.38    0.00    1.29    0.67    0.00   93.66

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  
%util
sda               0.00  1204.40    0.00 1074.60     0.00     8.90 16.97     0.41    0.38   0.20  21.04
sdb               0.20     0.00    2.00    0.00     0.01     0.00 13.60     0.01    6.80   4.00   0.80
dm-0              0.00     0.00    0.00    9.00     0.00 0.04     8.00     0.01    1.27   0.42   0.38
dm-2              0.00     0.00    0.00 2270.00     0.00 8.87     8.00     1.33    0.59   0.09  20.74

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            4.24    0.00    1.16    0.58    0.00   94.02

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  
%util
sda               0.00  1159.20    0.00  985.80     0.00     8.38 17.41     0.34    0.35   0.18  17.84
sdb               0.00     0.00    0.20    1.40     0.00 0.01     8.62     0.00    0.75   0.75   0.12
sdc               0.00     0.00    0.00    0.20     0.00     0.00 16.00     0.00    1.00   1.00   0.02
dm-0              0.00     0.00    0.00    5.00     0.00 0.02     8.00     0.00    0.24   0.24   0.12
dm-2              0.00     0.00    0.00 2140.00     0.00 8.36     8.00     1.23    0.58   0.08  17.90

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            2.77    0.00    1.14    0.54    0.00   95.55

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  
%util
sda               0.00   768.20    0.00  847.40     0.00     6.31 15.25     0.36    0.43   0.19  16.40
sdb               0.00     0.00    0.00    2.00     0.00 0.01     7.00     0.00    0.10   0.10   0.02
sdc               0.00     0.00    0.00    2.00     0.00     0.04 37.60     0.00    0.70   0.40   0.08
dm-0              0.00     0.00    0.00   18.80     0.00 0.07     8.00     0.09    4.84   0.27   0.50
dm-2              0.00     0.00    0.00 1596.80     0.00 6.24     8.00     0.84    0.52   0.10  16.22
=====

===== vmstat =====
[root at dovecot2 ~]# vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
  2  0   3476 785664 223836 29329692    0    0    88    16    1    1 1  0 99  0  0
  1  0   3476 783512 223844 29330448    0    0    84  3383 2210 3985 3  1 96  0  0
  0  1   3476 779328 223848 29331208    0    0    20  9254 4530 9894 5  2 92  1  0
  2  0   3476 778616 223856 29331956    0    0     1  9104 4160 10909  4  1 94  1  0
  1  0   3476 777584 223864 29332932    0    0     0 10056 4428 10263  4  1 94  1  0
  0  0   3476 778312 223872 29333044    0    0     2   660  921 1871 1  0 99  0  0
=====

Really appreciate your willingness to help and your time spent on this!

Qing

On 04/11/2013 1:42 AM, Stan Hoeppner wrote:
> On 11/1/2013 2:11 PM, Qing Chang wrote:
>> I have collected some info following the FAQ guide. It may still
>> not be enough. Please let me know.
> You say /var/spool/mail contains 1360 "items".  That's ambiguous.  You
> mean 1360 user mbox files, yes?
>
>> # time to run ls -l (1360 items):
>> time ls -l /var/spool/mail
>> real    0m10.271s
>> user    0m0.097s
>> sys     0m0.162s
> The combined CPU execution time is apparently 0.259s.  Something else is
> adding 10s+ of delay.  mbox is notorious for locking problems.  I wonder
> if stat() is waiting on release of exclusive write locks being held by
> your MDA during delivery.  This would also tend to explain the
> variability you demonstrated, with one run being 10s and another being
> 7s--fewer files locked on the second run.
>
> Run 'ls' with "strace -T" to find out which system call is adding the
> delay.  Run perf-lock concurrently to see the locks.
>
> More comments below.
>
>
>> # Oct 29, 2013
>> #
>> # XFS slowness with ls du etc.
>>
>> kernel version (uname -a)
>> Linux dovecot2.sri.utoronto.ca 2.6.32-358.18.1.el6.x86_64 #1 SMP Fri Aug 2 17:04:38 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
>>
>> xfsprogs version (xfs_repair -V)
>> xfs_repair version 3.1.1
>>
>> number of CPUs: 16 virtual CPUs
>>
>> contents of /proc/meminfo
>> MemTotal:       32878632 kB
>> MemFree:          310260 kB
>> Buffers:          220496 kB
>> Cached:         29971428 kB
>> SwapCached:          212 kB
>> Active:          9726912 kB
>> Inactive:       21180124 kB
>> Active(anon):     385040 kB
>> Inactive(anon):   330260 kB
>> Active(file):    9341872 kB
>> Inactive(file): 20849864 kB
>> Unevictable:           0 kB
>> Mlocked:               0 kB
>> SwapTotal:       6160376 kB
>> SwapFree:        6159896 kB
>> Dirty:               772 kB
>> Writeback:             0 kB
>> AnonPages:        714912 kB
>> Mapped:            52696 kB
>> Shmem:               184 kB
>> Slab:            1045988 kB
>> SReclaimable:     776440 kB
>> SUnreclaim:       269548 kB
>> KernelStack:        8736 kB
>> PageTables:        57172 kB
>> NFS_Unstable:          0 kB
>> Bounce:                0 kB
>> WritebackTmp:          0 kB
>> CommitLimit:    22599692 kB
>> Committed_AS:    1148180 kB
>> VmallocTotal:   34359738367 kB
>> VmallocUsed:      335224 kB
>> VmallocChunk:   34333710552 kB
>> HardwareCorrupted:     0 kB
>> AnonHugePages:    180224 kB
>> HugePages_Total:       0
>> HugePages_Free:        0
>> HugePages_Rsvd:        0
>> HugePages_Surp:        0
>> Hugepagesize:       2048 kB
>> DirectMap4k:       10240 kB
>> DirectMap2M:    33544192 kB
>>
>> contents of /proc/mounts
>> [root at dovecot2 ~]# cat /proc/mounts
>> rootfs / rootfs rw 0 0
>> proc /proc proc rw,relatime 0 0
>> sysfs /sys sysfs rw,seclabel,relatime 0 0
>> devtmpfs /dev devtmpfs rw,seclabel,relatime,size=16429456k,nr_inodes=4107364,mode=755 0 0
>> devpts /dev/pts devpts rw,seclabel,relatime,gid=5,mode=620,ptmxmode=000 0 0
>> tmpfs /dev/shm tmpfs rw,seclabel,relatime 0 0
>> /dev/mapper/vg_ipa1-lv_root / ext4 rw,seclabel,relatime,barrier=1,data=ordered 0 0
>> none /selinux selinuxfs rw,relatime 0 0
>> devtmpfs /dev devtmpfs rw,seclabel,relatime,size=16429456k,nr_inodes=4107364,mode=755 0 0
>> /proc/bus/usb /proc/bus/usb usbfs rw,relatime 0 0
>> /dev/sda1 /boot ext4 rw,seclabel,relatime,barrier=1,data=ordered 0 0
>> /dev/mapper/vg_ipa1-lv_home /var ext4 rw,seclabel,relatime,barrier=1,data=ordered 0 0
>> /dev/sdb1 /mailhomes xfs rw,seclabel,relatime,attr2,delaylog,nobarrier,inode64,usrquota 0 0
>> /dev/sdc1 /var/spool/mail xfs rw,seclabel,relatime,attr2,delaylog,nobarrier,inode64,usrquota 0 0
>> none /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0
>> sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
>>
>> contents of /proc/partitions
>> major minor  #blocks  name
>>
>>     8        0   83886080 sda
>>     8        1     512000 sda1
>>     8        2   83373056 sda2
>>     8       16 5368709120 sdb
>>     8       17 5368707072 sdb1
>>     8       32 2147483648 sdc
>>     8       33 2147481600 sdc1
>>   253        0   56729600 dm-0
>>   253        1    6160384 dm-1
>>   253        2   20480000 dm-2
>>
>> RAID layout (hardware and/or software): Hardware RAID6 IBM Storwize V7000
> What we're looking for here is the RAID geometry of the local disk
> device, logical volume, SAN LUN, etc, underlying the XFS filesystem,
> i.e. number of disks in the array, RAID level, stripe unit and stripe width.
>
>> LVM configuration: XFS not configured
>>
>> type of disks you are using: SAS dirves
>>
>> write cache status of drives: V7000 has 16 GB of cache
>>
>> size of BBWC and mode it is running in: N/A
> The 16GB of BBWC in the V7000 defaults to write-back mode, or has been
> manually enabled.  If not you're playing Russian roulette, because
> you're mounting with nobarrier.  You only disable barriers if you have
> persistent write cache, either battery or flash based.
>
>> xfs_info output on the filesystem in question
>> meta-data=/dev/sdc1              isize=256    agcount=4, agsize=134217600 blks
>>           =                       sectsz=512   attr=2
>> data     =                       bsize=4096   blocks=536870400, imaxpct=5
>>           =                       sunit=0      swidth=0 blks
>> naming   =version 2              bsize=4096   ascii-ci=0
>> log      =internal               bsize=4096   blocks=262143, version=2
>>           =                       sectsz=512   sunit=0 blks, lazy-count=1
>> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 2TB seems rather large for a 1360 user mail spool directory, given new
> mail is apparently moved from INBOX to other folders in /mailhomes.
>
> Additionally, given the innate parallelism in a mail workload, 4 AGs is
> far less than optimal.  XFS throughput with a highly parallel workload
> is limited by the number of allocation groups.  This filesystem was
> apparently created with mkfs.xfs defaults.  It should have been created
> with many more AGs, 16 minimum.  Same goes for the /mailhomes filesystem.
>
>> dmesg output showing all error messages and stack traces: N/A
>>
>> # time to run ls -l (1360 items):
>> time ls -l /var/spool/mail
>> real    0m10.271s
>> user    0m0.097s
>> sys     0m0.162s
>>
>> # iostat when ls -l is running:
>> [root at dovecot2 ~]# date;iostat -xdmz 5
>> Tue Oct 29 14:44:28 EDT 2013
>> Linux 2.6.32-358.18.1.el6.x86_64 (dovecot2.sri.utoronto.ca)     10/29/2013      _x86_64_        (16 CPU)
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
>> sda               0.02    17.87    0.17   24.49     0.01     0.17    14.16     0.01    0.56   0.28   0.68
>> sdb               0.19     0.08    1.62    1.94     0.04     0.02    33.47     0.01    2.76   1.23   0.44
>> sdc               0.00     0.03   11.57    0.69     1.10     0.05   192.23     0.02    1.65   0.78   0.96
>> dm-0              0.00     0.00    0.11    2.24     0.00     0.01     8.70     0.00    1.14   0.51   0.12
>> dm-1              0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    1.26   0.24   0.00
>> dm-2              0.00     0.00    0.08   40.11     0.00     0.16     8.18     0.03    0.86   0.14   0.57
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
>> sda               0.00   155.00    0.00  127.20     0.00     1.10    17.75     0.06    0.45   0.25   3.22
>> sdb               0.00     0.20    0.00   11.60     0.00     0.07    11.81     0.05    4.07   0.17   0.20
>> sdc               0.00     0.00    0.00    0.40     0.00     0.00    12.00     0.00    0.50   0.50   0.02
>> dm-0              0.00     0.00    0.00    1.20     0.00     0.00     8.00     0.01    4.50   4.50   0.54
>> dm-2              0.00     0.00    0.00  281.40     0.00     1.10     8.00     0.19    0.68   0.10   2.74
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
>> sda               0.00  1480.80    0.00  627.20     0.00     8.23    26.89     0.28    0.45   0.20  12.70
>> sdb               0.00     0.00    0.40    4.00     0.00     0.03    18.27     0.00    1.05   1.05   0.46
>> sdc               0.00     0.00    0.00    1.40     0.00     0.00     6.71     0.00    0.14   0.14   0.02
>> dm-0              0.00     0.00    0.00   17.20     0.00     0.07     8.00     0.02    1.09   0.09   0.16
>> dm-2              0.00     0.00    0.00 2090.20     0.00     8.16     8.00     1.65    0.79   0.06  12.52
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
>> sda               0.00  1331.00    0.00  629.80     0.00     7.66    24.91     0.28    0.44   0.22  13.64
>> sdb               0.00     0.00    0.00    2.80     0.00     0.02    14.07     0.00    0.79   0.57   0.16
>> sdc               0.00     0.00    0.00    1.40     0.00     0.02    30.14     0.00    0.57   0.57   0.08
>> dm-0              0.00     0.00    0.00    1.20     0.00     0.00     8.00     0.00    3.83   3.83   0.46
>> dm-2              0.00     0.00    0.00 1959.60     0.00     7.65     8.00     1.43    0.73   0.07  13.20
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
>> sda               0.00     6.60    0.00    0.60     0.00     0.03    96.00     0.00    1.33   1.33   0.08
>> sdb               0.00     0.20    0.00    0.80     0.00     0.00     8.25     0.00    0.75   0.75   0.06
>> sdc               0.00     0.20    0.00    0.20     0.00     0.00    16.00     0.00    0.00   0.00   0.00
>> dm-0              0.00     0.00    0.00    6.20     0.00     0.02     8.00     0.01    0.97   0.03   0.02
>> dm-2              0.00     0.00    0.00    1.00     0.00     0.00     8.00     0.00    3.00   0.60   0.06
>>
>> # a second run of ls -l:
>> [root at dovecot2 ~]# date;time ls -l /var/spool/mail
>> Fri Nov  1 15:04:43 EDT 2013
>> real    0m7.814s
>> user    0m0.096s
>> sys     0m0.139s
>>
>> [root at dovecot2 ~]# date;iostat -xmdz 5
>> Fri Nov  1 15:04:41 EDT 2013
>> Linux 2.6.32-358.18.1.el6.x86_64 (dovecot2.sri.utoronto.ca)     11/01/2013      _x86_64_        (16 CPU)
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
>> sda               0.01    21.20    0.16   28.28     0.00     0.19    14.27     0.02    0.57   0.27   0.78
>> sdb               0.22     0.09    1.70    2.19     0.04     0.02    31.03     0.01    2.83   1.20   0.47
>> sdc               0.00     0.04   12.63    0.75     1.21     0.05   192.59     0.02    1.60   0.77   1.04
>> dm-0              0.00     0.00    0.11    2.46     0.00     0.01     8.57     0.00    1.17   0.51   0.13
>> dm-1              0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    2.03   0.21   0.00
>> dm-2              0.00     0.00    0.07   47.02     0.00     0.18     8.15     0.04    0.93   0.14   0.65
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
>> sda               0.00   730.20    0.00 1251.60     0.00     7.76    12.69     0.52    0.42   0.21  25.86
>> sdb               0.00     1.00    0.80   25.40     0.01     0.16    12.62     0.01    0.37   0.18   0.46
>> sdc               0.00     0.00    0.00    2.60     0.00     0.03    20.62     0.00    0.31   0.23   0.06
>> dm-0              0.00     0.00    0.00    0.20     0.00     0.00     8.00     0.00    3.00   3.00   0.06
>> dm-2              0.00     0.00    0.00 1981.00     0.00     7.74     8.00     1.04    0.52   0.13  25.96
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
>> sda               0.00  1544.20    0.00 2844.60     0.00    17.14    12.34     1.10    0.39   0.21  58.34
>> sdb               0.00     0.00    0.00    1.20     0.00     0.01     9.33     0.00    0.00   0.00   0.00
>> sdc               0.00     0.00    0.00    0.60     0.00     0.00     8.00     0.00    3.00   1.00   0.06
>> dm-0              0.00     0.00    0.00    3.00     0.00     0.01     8.00     0.00    0.07   0.07   0.02
>> dm-2              0.00     0.00    0.00 4386.00     0.00    17.13     8.00     2.19    0.50   0.13  58.46
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
>> sda               0.00   104.80    0.00  212.80     0.00     1.24    11.94     0.09    0.41   0.21   4.44
>> sdb               0.00     0.00    0.20    3.00     0.00     0.01     9.81     0.00    0.50   0.50   0.16
>> sdc               0.00     0.20    0.00    2.80     0.00     0.02    16.86     0.01    4.43   0.79   0.22
>> dm-0              0.00     0.00    0.00    0.40     0.00     0.00     8.00     0.00    5.50   5.50   0.22
>> dm-2              0.00     0.00    0.00  317.00     0.00     1.24     8.00     0.15    0.47   0.13   4.24
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
>> sda               0.00     6.60    0.00    1.20     0.00     0.03    52.00     0.00    0.33   0.33   0.04
>> sdb               0.00     0.20    0.00    1.00     0.00     0.00     9.60     0.00    0.00   0.00   0.00
>> sdc               0.00     0.00    0.00    0.20     0.00     0.00     2.00     0.00    1.00   1.00   0.02
>> dm-0              0.00     0.00    0.00    4.40     0.00     0.02     8.00     0.00    0.95   0.05   0.02
>> dm-2              0.00     0.00    0.00    3.40     0.00     0.01     8.00     0.00    0.06   0.06   0.02
>>
>> [root at dovecot2 ~]# date;vmstat 5
>> Fri Nov  1 15:04:39 EDT 2013
>> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
>>   r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>>   0  0   1304 542644 229484 29885592    0    0    80    17    1    1  1  0 99  0  0
>>   0  1   1304 542752 229484 29885696    0    0     0  2211 1518 1281  1  0 98  0  0
>>   0  1   1304 544736 229484 29885696    0    0     6 17216 7535 7631  3  2 94  2  0
>>   0  0   1304 543988 229484 29885696    0    0     0  7403 3782 3889  1  1 97  1  0
>>   0  0   1304 543604 229484 29885712    0    0     1   215  399  277  0  0 100  0  0
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20131112/b4c3807c/attachment-0001.html>


More information about the xfs mailing list