xfs
[Top] [All Lists]

Re: slow ls -l

To: stan@xxxxxxxxxxxxxxxxx, xfs@xxxxxxxxxxx
Subject: Re: slow ls -l
From: Qing Chang <qchang@xxxxxxxxxxxxxxx>
Date: Tue, 12 Nov 2013 13:15:00 -0500
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <527741BF.1000309@xxxxxxxxxxxxxxxxx>
References: <526EAFD1.1030500@xxxxxxxxxxxxxxx> <526F7600.8060102@xxxxxxxxxxxxxxxxx> <526FC027.3040906@xxxxxxxxxxxxxxx> <526FC84D.7070905@xxxxxxxxxxxxxxxxx> <5273FCCB.60601@xxxxxxxxxxxxxxx> <527741BF.1000309@xxxxxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0
Hi Stan,

thank you very much for your comments, I need some time to
follow through all of your suggestions.

Yes, this is a dovecot IMAP and sendmail SMTP server, and there
are 1360 mailboxes.

With this email, I'd like to give you a bit more information with
strace -T and others commands.

Because "# CONFIG_LOCK_STAT is not set", I can not run perf
lock record. Since this is a production server, I am reluctant to
rebuild the kernel on it. But I hope the info below may give
you an idea that it is not "lock" related. As slow "ls -l" and "du"
also happen on the two file servers (64TB and 32TB), it does not
feel like it is lock related.

I run this "(time strace -T ls -l /var/spool/mail > ls-out) >& ls-trace"
and got quite a bit data.

First "time" output:
=====
real    0m17.720s
user    0m0.574s
sys     0m1.271s
=====

Some 14s was spent on 164 "poll" calls on fd=5 that took more than 10ms,
total number of "poll" call is 6618! 

Below is the trace output, red numbers are the three longest time taken, listed
at top are those that took more than 100ms, the three sections are for the top
three time taken, note the three mailboxes are not special compare to another
mailboxes, a run at another time would produce calls on different mailboxes.
Also, cache seams to play a role too, after a fresh run that takes long time,
subsequent runs give instant results:
=====
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.144634>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.104580>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.104812>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.181959>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.102520>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.107539>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.142462>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.215745>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.309854>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.166665>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.101035>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.113119>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.120745>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.110027>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.137906>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.249039>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.104079>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.250003>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.103791>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.102018>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.103551>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.103696>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.102164>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.105843>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.104413>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.101304>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.101869>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.101366>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.132958>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.309411>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.100342>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.101279>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.144725>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.109867>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.109951>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.290189>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.102449>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.101182>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.111699>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.107351>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.104558>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.115447>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.163904>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.266427>

##### df=5 opened:
socket(PF_FILE, SOCK_STREAM, 0)         = 5 <0.000016>
fcntl(5, F_GETFL)                       = 0x2 (flags O_RDWR) <0.000010>
fcntl(5, F_SETFL, O_RDWR|O_NONBLOCK)    = 0 <0.000010>
fcntl(5, F_GETFD)                       = 0 <0.000010>
fcntl(5, F_SETFD, FD_CLOEXEC)           = 0 <0.000010>
connect(5, {sa_family=AF_FILE, path="/var/lib/sss/pipes/nss"}, 110) = 0 <0.001886>
fstat(5, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 <0.000010>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000012>
sendto(5, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000029>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000012>
sendto(5, "\1\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000027>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000011>
read(5, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000013>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000011>
read(5, "\1\0\0\0", 4)                  = 4 <0.000025>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011>
sendto(5, "\24\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000026>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011>
sendto(5, "\225N\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000025>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000515>
read(5, "L\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000012>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000011>
read(5, "\1\0\0\0\0\0\0\0\225N\0\0\311\0\0\0qchang\0*\0Qing Ch"..., 60) = 60 <0.000026>
open("/etc/group", O_RDONLY|O_CLOEXEC)  = 6 <0.000017>
fstat(6, {st_mode=S_IFREG|0644, st_size=790, ...}) = 0 <0.000010>
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd5d53f4000 <0.004539>
read(6, "root:x:0:root\nbin:x:1:root,bin,d"..., 4096) = 790 <0.000018>
read(6, "", 4096)                       = 0 <0.000011>
close(6)                                = 0 <0.000011>
munmap(0x7fd5d53f4000, 4096)            = 0 <0.000016>
open("/var/lib/sss/mc/group", O_RDONLY|O_CLOEXEC) = 6 <0.000021>
fstat(6, {st_mode=S_IFREG|0644, st_size=5206312, ...}) = 0 <0.000010>
mmap(NULL, 5206312, PROT_READ, MAP_SHARED, 6, 0) = 0x7fd5ce3c1000 <0.000015>
poll([{fd=5, events=POLLIN|POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000012>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011>
sendto(5, "\24\0\0\0\"\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000031>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011>
sendto(5, "\311\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000025>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000427>
read(5, "?\0\0\0\"\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000013>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000011>
read(5, "\1\0\0\0\0\0\0\0\311\0\0\0\3\0\0\0\0*\0qiu"..., 47) = 47 <0.000017>
lstat("/var/spool/mail/root", {st_mode=S_IFREG|0600, st_size=187649403, ...}) = 0 <0.000019>
lgetxattr("/var/spool/mail/root", "security.selinux", "unconfined_u:object_r:file_t:s0", 255) = 32 <0.000020>
lstat("/var/spool/mail/root", {st_mode=S_IFREG|0600, st_size=187649403, ...}) = 0 <0.000014>
lgetxattr("/var/spool/mail/root", "system.posix_acl_access", 0x0, 0) = -1 ENODATA (No data available) <0.000018>
lgetxattr("/var/spool/mail/root", "system.posix_acl_default", 0x0, 0) = -1 ENODATA (No data available) <0.000016>
lstat("/var/spool/mail/agutstei", {st_mode=S_IFREG|0600, st_size=3864251, ...}) = 0 <0.000016>
lgetxattr("/var/spool/mail/agutstei", "security.selinux", "unconfined_u:object_r:file_t:s0", 255) = 32 <0.000018>
lstat("/var/spool/mail/agutstei", {st_mode=S_IFREG|0600, st_size=3864251, ...}) = 0 <0.000015>
lgetxattr("/var/spool/mail/agutstei", "system.posix_acl_access", 0x0, 0) = -1 ENODATA (No data available) <0.000016>
lgetxattr("/var/spool/mail/agutstei", "system.posix_acl_default", 0x0, 0) = -1 ENODATA (No data available) <0.000015>
open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 7 <0.000017>
fstat(7, {st_mode=S_IFREG|0644, st_size=1835, ...}) = 0 <0.000011>
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd5ce3c0000 <0.000014>
read(7, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 1835 <0.000020>
read(7, "", 4096)                       = 0 <0.000011>
close(7)                                = 0 <0.000012>

###### took 0.309854s:
munmap(0x7fd5ce3c0000, 4096)            = 0 <0.000015>
poll([{fd=5, events=POLLIN|POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000010>
sendto(5, "\24\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000034>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000010>
sendto(5, "\253\260\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000335>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.309854>
read(5, "N\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000012>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000010>
read(5, "\1\0\0\0\0\0\0\0\253\260\0\0\253\1\0\0acqu\0*\0Chen Ge A"..., 62) = 62 <0.000016>
lstat("/var/spool/mail/araouf", {st_mode=S_IFREG|0600, st_size=3943191, ...}) = 0 <0.000052>
lgetxattr("/var/spool/mail/araouf", "security.selinux", "unconfined_u:object_r:file_t:s0", 255) = 32 <0.000021>
lstat("/var/spool/mail/araouf", {st_mode=S_IFREG|0600, st_size=3943191, ...}) = 0 <0.000014>
lgetxattr("/var/spool/mail/araouf", "system.posix_acl_access", 0x0, 0) = -1 ENODATA (No data available) <0.000017>
lgetxattr("/var/spool/mail/araouf", "system.posix_acl_default", 0x0, 0) = -1 ENODATA (No data available) <0.000015>
open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 7 <0.000016>
fstat(7, {st_mode=S_IFREG|0644, st_size=1835, ...}) = 0 <0.000010>
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd5ce3c0000 <0.000012>
read(7, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 1835 <0.000018>
read(7, "", 4096)                       = 0 <0.000010>
close(7)                                = 0 <0.000012>

###### took 0.309411:
munmap(0x7fd5ce3c0000, 4096)            = 0 <0.000015>
poll([{fd=5, events=POLLIN|POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000010>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000009>
sendto(5, "\24\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000035>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000010>
sendto(5, "\357\246\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000342>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.309411>
read(5, "V\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000012>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000010>
read(5, "\1\0\0\0\0\0\0\0\357\246\0\0\253\1\0\0pthaya\0*\0Parthip"..., 70) = 70 <0.000018>
lstat("/var/spool/mail/ptsao", {st_mode=S_IFREG|0600, st_size=515358, ...}) = 0 <0.000025>
lgetxattr("/var/spool/mail/ptsao", "security.selinux", "unconfined_u:object_r:file_t:s0", 255) = 32 <0.000021>
lstat("/var/spool/mail/ptsao", {st_mode=S_IFREG|0600, st_size=515358, ...}) = 0 <0.000014>
lgetxattr("/var/spool/mail/ptsao", "system.posix_acl_access", 0x0, 0) = -1 ENODATA (No data available) <0.000019>
lgetxattr("/var/spool/mail/ptsao", "system.posix_acl_default", 0x0, 0) = -1 ENODATA (No data available) <0.000014>
open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 7 <0.000017>
fstat(7, {st_mode=S_IFREG|0644, st_size=1835, ...}) = 0 <0.000010>
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd5ce3c0000 <0.000012>
read(7, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 1835 <0.000019>
read(7, "", 4096)                       = 0 <0.000009>
close(7)                                = 0 <0.000011>

###### took 0.290189:
munmap(0x7fd5ce3c0000, 4096)            = 0 <0.000015>
poll([{fd=5, events=POLLIN|POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000010>
sendto(5, "\24\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000034>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000010>
sendto(5, "\10\270\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000391>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.290189>
read(5, "N\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000012>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000010>
read(5, "\1\0\0\0\0\0\0\0\10\270\0\0\327\1\0\0taberk\0*\0Thomas "..., 62) = 62 <0.000017>
lstat("/var/spool/mail/taha", {st_mode=S_IFREG|0600, st_size=293648143, ...}) = 0 <0.000018>
lgetxattr("/var/spool/mail/taha", "security.selinux", "unconfined_u:object_r:file_t:s0", 255) = 32 <0.000022>
lstat("/var/spool/mail/taha", {st_mode=S_IFREG|0600, st_size=293648143, ...}) = 0 <0.000013>
lgetxattr("/var/spool/mail/taha", "system.posix_acl_access", 0x0, 0) = -1 ENODATA (No data available) <0.000021>
lgetxattr("/var/spool/mail/taha", "system.posix_acl_default", 0x0, 0) = -1 ENODATA (No data available) <0.000014>
open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 7 <0.000021>
fstat(7, {st_mode=S_IFREG|0644, st_size=1835, ...}) = 0 <0.000010>
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd5ce3c0000 <0.000015>
read(7, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 1835 <0.000033>
read(7, "", 4096)                       = 0 <0.000011>
close(7)                                = 0 <0.000013>
=====

====== iostat around the time ls -l was run =====
[root@dovecot2 ~]# iostat -xmz 5
Linux 2.6.32-358.18.1.el6.x86_64 (dovecot2.sri.utoronto.ca)     11/12/2013      _x86_64_        (16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.56    0.00    0.17    0.06    0.00   99.21

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.01    19.88    0.14   26.86     0.00     0.18    14.18     0.02    0.57   0.28   0.75
sdb               0.23     0.08    1.75    1.98     0.04     0.02    30.38     0.01    2.81   1.26   0.47
sdc               0.00     0.03   13.78    0.71     1.34     0.05   196.35     0.02    1.60   0.77   1.11
dm-0              0.00     0.00    0.09    2.42     0.00     0.01     8.49     0.00    1.18   0.52   0.13
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    3.09   0.27   0.00
dm-2              0.00     0.00    0.06   44.33     0.00     0.17     8.14     0.04    0.90   0.14   0.63

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.31    0.00    0.46    0.10    0.00   98.12

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00    27.40    1.00   65.80     0.02     0.36    11.78     0.04    0.61   0.43   2.86
sdb               0.00     0.00    1.80    0.00     0.03     0.00    31.11     0.01    6.44   2.00   0.36
sdc               0.00     1.00    0.00    4.80     0.00     0.03    11.79     0.03    6.88   0.46   0.22
dm-0              0.00     0.00    0.80    1.20     0.02     0.00    24.00     0.01    6.10   6.10   1.22
dm-2              0.00     0.00    0.00   92.00     0.00     0.36     8.00     0.06    0.62   0.19   1.76

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.76    0.00    1.62    0.59    0.00   92.03

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00  1045.60    0.80  959.40     0.05     7.83    16.82     0.44    0.46   0.21  19.98
sdb               0.00     0.00    1.40    1.20     0.01     0.01     9.23     0.00    0.62   0.54   0.14
sdc               0.00     0.00    0.00    0.60     0.00     0.01    18.67     0.00    0.33   0.33   0.02
dm-0              0.00     0.00    0.80    8.00     0.05     0.03    20.00     0.02    2.05   1.41   1.24
dm-2              0.00     0.00    0.00 1997.00     0.00     7.80     8.00     1.24    0.62   0.09  18.86

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.38    0.00    1.29    0.67    0.00   93.66

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00  1204.40    0.00 1074.60     0.00     8.90    16.97     0.41    0.38   0.20  21.04
sdb               0.20     0.00    2.00    0.00     0.01     0.00    13.60     0.01    6.80   4.00   0.80
dm-0              0.00     0.00    0.00    9.00     0.00     0.04     8.00     0.01    1.27   0.42   0.38
dm-2              0.00     0.00    0.00 2270.00     0.00     8.87     8.00     1.33    0.59   0.09  20.74

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.24    0.00    1.16    0.58    0.00   94.02

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00  1159.20    0.00  985.80     0.00     8.38    17.41     0.34    0.35   0.18  17.84
sdb               0.00     0.00    0.20    1.40     0.00     0.01     8.62     0.00    0.75   0.75   0.12
sdc               0.00     0.00    0.00    0.20     0.00     0.00    16.00     0.00    1.00   1.00   0.02
dm-0              0.00     0.00    0.00    5.00     0.00     0.02     8.00     0.00    0.24   0.24   0.12
dm-2              0.00     0.00    0.00 2140.00     0.00     8.36     8.00     1.23    0.58   0.08  17.90

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.77    0.00    1.14    0.54    0.00   95.55

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   768.20    0.00  847.40     0.00     6.31    15.25     0.36    0.43   0.19  16.40
sdb               0.00     0.00    0.00    2.00     0.00     0.01     7.00     0.00    0.10   0.10   0.02
sdc               0.00     0.00    0.00    2.00     0.00     0.04    37.60     0.00    0.70   0.40   0.08
dm-0              0.00     0.00    0.00   18.80     0.00     0.07     8.00     0.09    4.84   0.27   0.50
dm-2              0.00     0.00    0.00 1596.80     0.00     6.24     8.00     0.84    0.52   0.10  16.22
=====

===== vmstat =====
[root@dovecot2 ~]# vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0   3476 785664 223836 29329692    0    0    88    16    1    1  1  0 99  0  0
 1  0   3476 783512 223844 29330448    0    0    84  3383 2210 3985  3  1 96  0  0
 0  1   3476 779328 223848 29331208    0    0    20  9254 4530 9894  5  2 92  1  0
 2  0   3476 778616 223856 29331956    0    0     1  9104 4160 10909  4  1 94  1  0
 1  0   3476 777584 223864 29332932    0    0     0 10056 4428 10263  4  1 94  1  0
 0  0   3476 778312 223872 29333044    0    0     2   660  921 1871  1  0 99  0  0
=====

Really appreciate your willingness to help and your time spent on this!

Qing

On 04/11/2013 1:42 AM, Stan Hoeppner wrote:
On 11/1/2013 2:11 PM, Qing Chang wrote:
I have collected some info following the FAQ guide. It may still
not be enough. Please let me know.
You say /var/spool/mail contains 1360 "items".  That's ambiguous.  You
mean 1360 user mbox files, yes?

# time to run ls -l (1360 items):
time ls -l /var/spool/mail
real    0m10.271s
user    0m0.097s
sys     0m0.162s
The combined CPU execution time is apparently 0.259s.  Something else is
adding 10s+ of delay.  mbox is notorious for locking problems.  I wonder
if stat() is waiting on release of exclusive write locks being held by
your MDA during delivery.  This would also tend to explain the
variability you demonstrated, with one run being 10s and another being
7s--fewer files locked on the second run.

Run 'ls' with "strace -T" to find out which system call is adding the
delay.  Run perf-lock concurrently to see the locks.

More comments below.


# Oct 29, 2013
#
# XFS slowness with ls du etc.

kernel version (uname -a)
Linux dovecot2.sri.utoronto.ca 2.6.32-358.18.1.el6.x86_64 #1 SMP Fri Aug 2 17:04:38 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux

xfsprogs version (xfs_repair -V)
xfs_repair version 3.1.1

number of CPUs: 16 virtual CPUs

contents of /proc/meminfo
MemTotal:       32878632 kB
MemFree:          310260 kB
Buffers:          220496 kB
Cached:         29971428 kB
SwapCached:          212 kB
Active:          9726912 kB
Inactive:       21180124 kB
Active(anon):     385040 kB
Inactive(anon):   330260 kB
Active(file):    9341872 kB
Inactive(file): 20849864 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       6160376 kB
SwapFree:        6159896 kB
Dirty:               772 kB
Writeback:             0 kB
AnonPages:        714912 kB
Mapped:            52696 kB
Shmem:               184 kB
Slab:            1045988 kB
SReclaimable:     776440 kB
SUnreclaim:       269548 kB
KernelStack:        8736 kB
PageTables:        57172 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    22599692 kB
Committed_AS:    1148180 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      335224 kB
VmallocChunk:   34333710552 kB
HardwareCorrupted:     0 kB
AnonHugePages:    180224 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       10240 kB
DirectMap2M:    33544192 kB

contents of /proc/mounts
[root@dovecot2 ~]# cat /proc/mounts
rootfs / rootfs rw 0 0
proc /proc proc rw,relatime 0 0
sysfs /sys sysfs rw,seclabel,relatime 0 0
devtmpfs /dev devtmpfs rw,seclabel,relatime,size=16429456k,nr_inodes=4107364,mode=755 0 0
devpts /dev/pts devpts rw,seclabel,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /dev/shm tmpfs rw,seclabel,relatime 0 0
/dev/mapper/vg_ipa1-lv_root / ext4 rw,seclabel,relatime,barrier=1,data="" 0 0
none /selinux selinuxfs rw,relatime 0 0
devtmpfs /dev devtmpfs rw,seclabel,relatime,size=16429456k,nr_inodes=4107364,mode=755 0 0
/proc/bus/usb /proc/bus/usb usbfs rw,relatime 0 0
/dev/sda1 /boot ext4 rw,seclabel,relatime,barrier=1,data="" 0 0
/dev/mapper/vg_ipa1-lv_home /var ext4 rw,seclabel,relatime,barrier=1,data="" 0 0
/dev/sdb1 /mailhomes xfs rw,seclabel,relatime,attr2,delaylog,nobarrier,inode64,usrquota 0 0
/dev/sdc1 /var/spool/mail xfs rw,seclabel,relatime,attr2,delaylog,nobarrier,inode64,usrquota 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0

contents of /proc/partitions
major minor  #blocks  name

   8        0   83886080 sda
   8        1     512000 sda1
   8        2   83373056 sda2
   8       16 5368709120 sdb
   8       17 5368707072 sdb1
   8       32 2147483648 sdc
   8       33 2147481600 sdc1
 253        0   56729600 dm-0
 253        1    6160384 dm-1
 253        2   20480000 dm-2

RAID layout (hardware and/or software): Hardware RAID6 IBM Storwize V7000
What we're looking for here is the RAID geometry of the local disk
device, logical volume, SAN LUN, etc, underlying the XFS filesystem,
i.e. number of disks in the array, RAID level, stripe unit and stripe width.

LVM configuration: XFS not configured

type of disks you are using: SAS dirves

write cache status of drives: V7000 has 16 GB of cache

size of BBWC and mode it is running in: N/A
The 16GB of BBWC in the V7000 defaults to write-back mode, or has been
manually enabled.  If not you're playing Russian roulette, because
you're mounting with nobarrier.  You only disable barriers if you have
persistent write cache, either battery or flash based.

xfs_info output on the filesystem in question
meta-data=""              isize=256    agcount=4, agsize=134217600 blks
         =                       sectsz=512   attr=2
data     =                       ""   blocks=536870400, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=262143, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
2TB seems rather large for a 1360 user mail spool directory, given new
mail is apparently moved from INBOX to other folders in /mailhomes.

Additionally, given the innate parallelism in a mail workload, 4 AGs is
far less than optimal.  XFS throughput with a highly parallel workload
is limited by the number of allocation groups.  This filesystem was
apparently created with mkfs.xfs defaults.  It should have been created
with many more AGs, 16 minimum.  Same goes for the /mailhomes filesystem.

dmesg output showing all error messages and stack traces: N/A

# time to run ls -l (1360 items):
time ls -l /var/spool/mail
real    0m10.271s
user    0m0.097s
sys     0m0.162s

# iostat when ls -l is running:
[root@dovecot2 ~]# date;iostat -xdmz 5
Tue Oct 29 14:44:28 EDT 2013
Linux 2.6.32-358.18.1.el6.x86_64 (dovecot2.sri.utoronto.ca)     10/29/2013      _x86_64_        (16 CPU)

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.02    17.87    0.17   24.49     0.01     0.17    14.16     0.01    0.56   0.28   0.68
sdb               0.19     0.08    1.62    1.94     0.04     0.02    33.47     0.01    2.76   1.23   0.44
sdc               0.00     0.03   11.57    0.69     1.10     0.05   192.23     0.02    1.65   0.78   0.96
dm-0              0.00     0.00    0.11    2.24     0.00     0.01     8.70     0.00    1.14   0.51   0.12
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    1.26   0.24   0.00
dm-2              0.00     0.00    0.08   40.11     0.00     0.16     8.18     0.03    0.86   0.14   0.57

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   155.00    0.00  127.20     0.00     1.10    17.75     0.06    0.45   0.25   3.22
sdb               0.00     0.20    0.00   11.60     0.00     0.07    11.81     0.05    4.07   0.17   0.20
sdc               0.00     0.00    0.00    0.40     0.00     0.00    12.00     0.00    0.50   0.50   0.02
dm-0              0.00     0.00    0.00    1.20     0.00     0.00     8.00     0.01    4.50   4.50   0.54
dm-2              0.00     0.00    0.00  281.40     0.00     1.10     8.00     0.19    0.68   0.10   2.74

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00  1480.80    0.00  627.20     0.00     8.23    26.89     0.28    0.45   0.20  12.70
sdb               0.00     0.00    0.40    4.00     0.00     0.03    18.27     0.00    1.05   1.05   0.46
sdc               0.00     0.00    0.00    1.40     0.00     0.00     6.71     0.00    0.14   0.14   0.02
dm-0              0.00     0.00    0.00   17.20     0.00     0.07     8.00     0.02    1.09   0.09   0.16
dm-2              0.00     0.00    0.00 2090.20     0.00     8.16     8.00     1.65    0.79   0.06  12.52

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00  1331.00    0.00  629.80     0.00     7.66    24.91     0.28    0.44   0.22  13.64
sdb               0.00     0.00    0.00    2.80     0.00     0.02    14.07     0.00    0.79   0.57   0.16
sdc               0.00     0.00    0.00    1.40     0.00     0.02    30.14     0.00    0.57   0.57   0.08
dm-0              0.00     0.00    0.00    1.20     0.00     0.00     8.00     0.00    3.83   3.83   0.46
dm-2              0.00     0.00    0.00 1959.60     0.00     7.65     8.00     1.43    0.73   0.07  13.20

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     6.60    0.00    0.60     0.00     0.03    96.00     0.00    1.33   1.33   0.08
sdb               0.00     0.20    0.00    0.80     0.00     0.00     8.25     0.00    0.75   0.75   0.06
sdc               0.00     0.20    0.00    0.20     0.00     0.00    16.00     0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00    6.20     0.00     0.02     8.00     0.01    0.97   0.03   0.02
dm-2              0.00     0.00    0.00    1.00     0.00     0.00     8.00     0.00    3.00   0.60   0.06

# a second run of ls -l:
[root@dovecot2 ~]# date;time ls -l /var/spool/mail
Fri Nov  1 15:04:43 EDT 2013
real    0m7.814s
user    0m0.096s
sys     0m0.139s

[root@dovecot2 ~]# date;iostat -xmdz 5
Fri Nov  1 15:04:41 EDT 2013
Linux 2.6.32-358.18.1.el6.x86_64 (dovecot2.sri.utoronto.ca)     11/01/2013      _x86_64_        (16 CPU)

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.01    21.20    0.16   28.28     0.00     0.19    14.27     0.02    0.57   0.27   0.78
sdb               0.22     0.09    1.70    2.19     0.04     0.02    31.03     0.01    2.83   1.20   0.47
sdc               0.00     0.04   12.63    0.75     1.21     0.05   192.59     0.02    1.60   0.77   1.04
dm-0              0.00     0.00    0.11    2.46     0.00     0.01     8.57     0.00    1.17   0.51   0.13
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    2.03   0.21   0.00
dm-2              0.00     0.00    0.07   47.02     0.00     0.18     8.15     0.04    0.93   0.14   0.65

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   730.20    0.00 1251.60     0.00     7.76    12.69     0.52    0.42   0.21  25.86
sdb               0.00     1.00    0.80   25.40     0.01     0.16    12.62     0.01    0.37   0.18   0.46
sdc               0.00     0.00    0.00    2.60     0.00     0.03    20.62     0.00    0.31   0.23   0.06
dm-0              0.00     0.00    0.00    0.20     0.00     0.00     8.00     0.00    3.00   3.00   0.06
dm-2              0.00     0.00    0.00 1981.00     0.00     7.74     8.00     1.04    0.52   0.13  25.96

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00  1544.20    0.00 2844.60     0.00    17.14    12.34     1.10    0.39   0.21  58.34
sdb               0.00     0.00    0.00    1.20     0.00     0.01     9.33     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.60     0.00     0.00     8.00     0.00    3.00   1.00   0.06
dm-0              0.00     0.00    0.00    3.00     0.00     0.01     8.00     0.00    0.07   0.07   0.02
dm-2              0.00     0.00    0.00 4386.00     0.00    17.13     8.00     2.19    0.50   0.13  58.46

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   104.80    0.00  212.80     0.00     1.24    11.94     0.09    0.41   0.21   4.44
sdb               0.00     0.00    0.20    3.00     0.00     0.01     9.81     0.00    0.50   0.50   0.16
sdc               0.00     0.20    0.00    2.80     0.00     0.02    16.86     0.01    4.43   0.79   0.22
dm-0              0.00     0.00    0.00    0.40     0.00     0.00     8.00     0.00    5.50   5.50   0.22
dm-2              0.00     0.00    0.00  317.00     0.00     1.24     8.00     0.15    0.47   0.13   4.24

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     6.60    0.00    1.20     0.00     0.03    52.00     0.00    0.33   0.33   0.04
sdb               0.00     0.20    0.00    1.00     0.00     0.00     9.60     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.20     0.00     0.00     2.00     0.00    1.00   1.00   0.02
dm-0              0.00     0.00    0.00    4.40     0.00     0.02     8.00     0.00    0.95   0.05   0.02
dm-2              0.00     0.00    0.00    3.40     0.00     0.01     8.00     0.00    0.06   0.06   0.02

[root@dovecot2 ~]# date;vmstat 5
Fri Nov  1 15:04:39 EDT 2013
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0   1304 542644 229484 29885592    0    0    80    17    1    1  1  0 99  0  0
 0  1   1304 542752 229484 29885696    0    0     0  2211 1518 1281  1  0 98  0  0
 0  1   1304 544736 229484 29885696    0    0     6 17216 7535 7631  3  2 94  2  0
 0  0   1304 543988 229484 29885696    0    0     0  7403 3782 3889  1  1 97  1  0
 0  0   1304 543604 229484 29885712    0    0     1   215  399  277  0  0 100  0  0


<Prev in Thread] Current Thread [Next in Thread>